Share this page:

Evaluating Cultural and Social Awareness of LLM Web Agents

Haoyi Qiu, Alexander Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, and Chien-Sheng Wu, in Findings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-Findings), 2025.

Download the full text


Abstract

With LLMs acting as web agents, robustness to cultural and social norms becomes critical. We release CASA, a benchmark covering two web tasks—online shopping and social forums—to test norm detection and response. CASA measures \emphawareness coverage, \emphhelpfulness, and \emphviolation rate. Current agents achieve <10% coverage and >40% violations, far worse than non-agent settings. Combining prompting with fine-tuning on culture-specific data yields complementary gains: fine-tuning improves cross-region generalization, while prompting helps navigate complex tasks. CASA thus spotlights the need for continual social-awareness evaluation during LLM-agent development.


Bib Entry

@inproceedings{qiu2025evaluating,
  author = {Qiu, Haoyi and Fabbri, Alexander and Agarwal, Divyansh and Huang, Kung{-}Hsiang and Tan, Sarah and Peng, Nanyun and Wu, Chien{-}Sheng},
  title = {Evaluating Cultural and Social Awareness of LLM Web Agents},
  booktitle = {Findings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-Findings)},
  year = {2025}
}

Related Publications