Evaluating Cultural and Social Awareness of LLM Web Agents
Haoyi Qiu, Alexander Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, and Chien-Sheng Wu, in Findings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-Findings), 2025.
Download the full text
Abstract
With LLMs acting as web agents, robustness to cultural and social norms becomes critical. We release CASA, a benchmark covering two web tasks—online shopping and social forums—to test norm detection and response. CASA measures \emphawareness coverage, \emphhelpfulness, and \emphviolation rate. Current agents achieve <10% coverage and >40% violations, far worse than non-agent settings. Combining prompting with fine-tuning on culture-specific data yields complementary gains: fine-tuning improves cross-region generalization, while prompting helps navigate complex tasks. CASA thus spotlights the need for continual social-awareness evaluation during LLM-agent development.
Bib Entry
@inproceedings{qiu2025evaluating, author = {Qiu, Haoyi and Fabbri, Alexander and Agarwal, Divyansh and Huang, Kung{-}Hsiang and Tan, Sarah and Peng, Nanyun and Wu, Chien{-}Sheng}, title = {Evaluating Cultural and Social Awareness of LLM Web Agents}, booktitle = {Findings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-Findings)}, year = {2025} }