Model Parallelism: Training Large Neural Networks

Description

This tutorial covers the full workflow of training large neural networks using model parallelism, split across two sessions that build on each other. **Part 1 — Introduction to Training of Large Models [13:00–14:30]** Theoretical foundations: scaling laws and parallelization techniques for large neural networks. **Part 2 — Practical Session with Megatron Bridge [this session, 16:30–18:00]** Hands-on session: setting up training scripts, running experiments with different distributed settings, and exploring lower precision training. The session is instructor-led with materials provided for replication on the Durham cluster.