Re-ReST: Reflection-Reinforced Self-Training for Language Agents
Zi-Yi Dou, Cheng-Fu Yang, Xueqing Wu, Kai-Wei Chang, and Nanyun Peng, in Proceedings of The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
Download the full text
Abstract
Finetuning language agents with reasoning-action trajectories is effective, but obtaining these trajectories from human annotations or stronger models is costly and sometimes impractical. In this paper, we investigate the use of self-training in language agents, which can generate supervision from the agent itself, offering a promising alternative without relying on human or stronger model demonstrations. Self-training, however, requires high-quality model-generated samples, which are hard to obtain for challenging language agent tasks. To address this, we present Reflection-Reinforced Self-Training (Re-ReST), which uses a reflector to refine low-quality generated samples during self-training. The reflector takes the agent’s output and feedback from an external environment to produce improved samples. We conduct extensive experiments on open-source language agents across tasks, demonstrating the effectiveness of self-training and Re-ReST in language agent tasks.
Bib Entry
@inproceedings{dou2024rerest, author = {Dou, Zi-Yi and Yang, Cheng-Fu and Wu, Xueqing and Chang, Kai-Wei and Peng, Nanyun}, title = {Re-ReST: Reflection-Reinforced Self-Training for Language Agents}, booktitle = {Proceedings of The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2024}, keywords = {agent} }