Share this page:

Re-ReST: Reflection-Reinforced Self-Training for Language Agents

Zi-Yi Dou, Cheng-Fu Yang, Xueqing Wu, Kai-Wei Chang, and Nanyun Peng, in Proceedings of The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

Abstract

Finetuning language agents with reasoning-action trajectories is effective, but obtaining these trajectories from human annotations or stronger models is costly and sometimes impractical. In this paper, we investigate the use of self-training in language agents, which can generate supervision from the agent itself, offering a promising alternative without relying on human or stronger model demonstrations. Self-training, however, requires high-quality model-generated samples, which are hard to obtain for challenging language agent tasks. To address this, we present Reflection-Reinforced Self-Training (Re-ReST), which uses a reflector to refine low-quality generated samples during self-training. The reflector takes the agent’s output and feedback from an external environment to produce improved samples. We conduct extensive experiments on open-source language agents across tasks, demonstrating the effectiveness of self-training and Re-ReST in language agent tasks.


Bib Entry

@inproceedings{dou2024rerest,
  author = {Dou, Zi-Yi and Yang, Cheng-Fu and Wu, Xueqing and Chang, Kai-Wei and Peng, Nanyun},
  title = {Re-ReST: Reflection-Reinforced Self-Training for Language Agents},
  booktitle = {Proceedings of The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2024}
}

Related Publications