Share this page:

Multilingual Routing in Mixture-of-Experts

Lucas Bandarkar, Chenyuan Yang, Mohsen Fayyaz, Junlin Hu, and Nanyun Peng, in Proceedings of the International Conference on Learning Representations (ICLR), 2026.

Download the full text


Abstract

The research investigates sparse routing dynamics using parallel multilingual datasets to uncover interpretable layer-wise phenomena. The findings indicate that MoE models route tokens in language-specific ways within the early and late decoder layers, but exhibit significant cross-lingual routing alignment in the middle layers. The authors introduce a method that steers the router by promoting middle-layer task experts frequently activated in English, which successfully increases multilingual performance.


Bib Entry

@inproceedings{bandarkar2026multilingual,
  title = {Multilingual Routing in Mixture-of-Experts},
  author = {Bandarkar, Lucas and Yang, Chenyuan and Fayyaz, Mohsen and Hu, Junlin and Peng, Nanyun},
  booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)},
  year = {2026}
}

Related Publications