What is Mixture of Experts (MoE)?
AI FundamentalsLast updated:
A model architecture that routes each input to a subset of specialized sub-networks for efficient scaling.
MoE models contain many expert networks but only activate a few per input token, dramatically reducing compute cost relative to total parameter count. Models like Mixtral and GPT-4 use MoE to achieve high capability with efficient inference.