Share this page:

SNaRe: Domain-aware Data Generation for Low-Resource Event Detection

Tanmay Parekh, Yuxuan Dong, Lucas Bandarkar, Artin Kim, I.-Hung Hsu, Kai-Wei Chang, and Nanyun Peng, in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025.

Download the full text


Abstract

Event Detection (ED) – the task of identifying event mentions from natural language text – is critical for enabling reasoning in highly specialized domains such as biomedicine, law, and epidemiology. Data generation has proven to be effective in broadening its utility to wider applications without requiring expensive expert annotations. However, when existing generation approaches are applied to specialized domains, they struggle with label noise, where annotations are incorrect, and domain drift, characterized by a distributional mismatch between generated sentences and the target domain. To address these issues, we introduce SNaRe, a domain-aware synthetic data generation framework composed of three components: Scout, Narrator, and Refiner. Scout extracts triggers from unlabeled target domain data and curates a high-quality domain-specific trigger list using corpus-level statistics to mitigate domain drift. Narrator, conditioned on these triggers, generates high-quality domain-aligned sentences, and Refiner identifies additional event mentions, ensuring high annotation quality. Experimentation on three diverse domain ED datasets reveals how SNaRe outperforms the best baseline, achieving average F1 gains of 3-7% in the zero-shot/few-shot settings and 4-20% F1 improvement for multilingual generation. Analyzing the generated trigger hit rate and human evaluation substantiates SNaRe’s stronger annotation quality and reduced domain drift.


Bib Entry

@inproceedings{parekh2025snare,
  title = {SNaRe: Domain-aware Data Generation for Low-Resource Event Detection},
  author = {Parekh, Tanmay and Dong, Yuxuan and Bandarkar, Lucas and Kim, Artin and Hsu, I-Hung and Chang, Kai-Wei and Peng, Nanyun},
  year = {2025},
  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)}
}

Related Publications