MERCY: Multiple Response Ranking Concurrently in Realistic Open-Domain Conversational Systems
Sarik Ghazarian, Behnam Hedayatnia, Di Jin, Sijia Liu, Nanyun Peng, Yang Liu, and Dilek Hakkani-Tur, in Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2023.
Download the full text
Abstract
Automatic Evaluation (AE) and Response Selection (RS) models assign quality scores to various candidate responses and rank them in conversational setups. Prior response ranking research compares various models’ performance on synthetically generated test sets. In this work, we investigate the performance of model-based reference-free AE and RS models on our constructed response ranking datasets that mirror real-case scenarios of ranking candidates during inference time. Metrics’ unsatisfying performance can be interpreted as their low generalizability over more pragmatic conversational domains such as human-chatbot dialogs. To alleviate this issue we propose a novel RS model called MERCY that simulates human behavior in selecting the best candidate by taking into account distinct candidates concurrently and learns to rank them. In addition, MERCY leverages natural language feedback as another component to help the ranking task by explaining why each candidate response is relevant/irrelevant to the dialog context. These feedbacks are generated by prompting large language models in a few-shot setup. Our experiments show the better performance of MERCY over baselines for the response ranking task in our curated realistic datasets.
Bib Entry
@inproceedings{ghazarian-etal-2023-mercy, title = {{MERCY}: Multiple Response Ranking Concurrently in Realistic Open-Domain Conversational Systems}, author = {Ghazarian, Sarik and Hedayatnia, Behnam and Jin, Di and Liu, Sijia and Peng, Nanyun and Liu, Yang and Hakkani-Tur, Dilek}, booktitle = {Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue}, year = {2023} }