# Mixture of Experts (MoE)

> MoE is a model architecture where a router activates only a few expert subnetworks per token — huge total capacity, a fraction of the compute per token.

**Mixture of Experts (MoE) is a transformer architecture where feed-forward layers are split into many "expert" subnetworks and a learned router sends each token to only a few of them — so a model can have enormous total parameters while spending only a fraction per token.**

The accounting is the whole story: an MoE quotes two numbers — total parameters (what it knows, what must fit in memory) and **active** parameters (what each token costs). A model with hundreds of billions total but tens of billions active generates at mid-size speed with near-frontier capability, which is why the architecture swept open-weight releases from Mixtral onward and underpins many frontier APIs.

For practitioners the implications land in serving: memory requirements follow *total* parameters even though throughput follows *active* ones, making [quantization](/glossary/quantization) and careful [inference](/glossary/inference) engineering more valuable, and shifting the [self-host economics](/guides/mlops/self-host-vs-api-llm) — an MoE you can't fit is capability you don't have, however cheap its tokens would have been.

---

_Source: https://agentscamp.com/glossary/mixture-of-experts — Term on AgentsCamp._