Share this page:

Model Extrapolation Expedites Alignment

Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, and Nanyun Peng, in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL) , 2025 .

Download the full text


Abstract

Given the high computational cost of alignment training of large language models (LLMs), exploring efficient methods to reduce the training overhead remains an important and compelling research problem. Drawing inspiration from literature on model interpolation, we propose a straightforward method called ExPO (model extrapolation) to expedite LLMs’ alignment with human preferences. We first observe that model interpolation typically leads to intermediate performance when applied to existing DPO/RLHF models and their SFT checkpoints. Motivated by this, we hypothetically treat a partially-trained model M1 as the interpolated result between the initial SFT model M0 and some better-aligned model M2. Then, we can obtain the hypothetical M2 by simply extrapolating the model weights along the direction from M0 to M1, thus reaching better alignment performance based on M1 without additional training overhead. We validate our hypothesis through controlled experiments, demonstrating that ExPO boosts a DPO model trained with only 20% steps to outperform the fully-trained one. Moreover, we show that ExPO notably improves existing open-source LLMs (ranging from 1.8B to 70B parameters) on the leading AlpacaEval 2.0 and MT-Bench benchmarks, highlighting ExPO’s broader utility in efficiently enhancing LLM alignment.


Bib Entry

@inproceedings{zheng2025expo,
  title = { Model Extrapolation Expedites Alignment },
  author = {Zheng, Chujie and Wang, Ziqi and Ji, Heng and Huang, Minlie and Peng, Nanyun},
  year = { 2025 },
  booktitle = { Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL) }
}

Related Publications