Share this page:

Steering MoE LLMs via Expert (De)Activation

Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Ryan Rossi, Trung Bui, Hinrich Schütze, and Nanyun Peng, in Proceedings of the International Conference on Learning Representations (ICLR), 2026.

Download the full text


Abstract

The paper introduces SteerMoE, a framework designed to control Mixture-of-Experts (MoE) models in Large Language Models (LLMs) by identifying and manipulating experts linked to specific behaviors. By selectively activating or deactivating these identified experts during inference, the framework can steer model behaviors such as faithfulness and safety without requiring fine-tuning or modifications to the model’s weights.


Bib Entry

@inproceedings{fayyaz2026steering,
  title = {Steering MoE LLMs via Expert (De)Activation},
  author = {Fayyaz, Mohsen and Modarressi, Ali and Deilamsalehy, Hanieh and Dernoncourt, Franck and Rossi, Ryan and Bui, Trung and Schütze, Hinrich and Peng, Nanyun},
  booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)},
  year = {2026}
}

Related Publications