GPU/AI Computing
News
Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems
Scaling Large MoE Models with Wide Expert Parallelism on NVL72 Rack Scale Systems
<img alt="" class="webfeedsFeaturedVisual wp-post-image" height="432" src="https://developer-blogs.nvidia.com/wp-content/uploads/2025/10/image4-2-768x432-jpg.webp" style="display: block; margin-bottom...
Modern AI workloads have moved well beyond single-GPU inference serving. Model parallelism, which efficiently splits computation across many GPUs, is now the...
Modern AI workloads have moved well beyond single-GPU inference serving. Model parallelism, which efficiently splits computation across many GPUs, is now the foundation of scalable, state-of-the-art deployments. The highest-performing models increasingly adopt mixture-of-experts (MoE) architectures, which are more efficient than dense models because they activate only a subset of trained…
Source: NVIDIA Technical Blog
Word count: 1118 words
Published on 2025-10-21 00:00