STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models
Mingyu Derek Ma, Xiaoxuan Wang, Po-Nien Kung, P. Jeffrey Brantingham, Nanyun Peng, and Wei Wang, in Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI), 2024.
Download the full text
Abstract
Bib Entry
@inproceedings{ma2024star, title = {STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models}, author = {Ma, Mingyu Derek and Wang, Xiaoxuan and Kung, Po-Nien and Brantingham, P. Jeffrey and Peng, Nanyun and Wang, Wei}, booktitle = {Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI)}, year = {2024} }
Related Publications
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models
Mingyu Derek Ma, Xiaoxuan Wang, Po-Nien Kung, P. Jeffrey Brantingham, Nanyun Peng, and Wei Wang, in Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI), 2024.
Full Text BibTeX Details@inproceedings{ma2024star, title = {STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models}, author = {Ma, Mingyu Derek and Wang, Xiaoxuan and Kung, Po-Nien and Brantingham, P. Jeffrey and Peng, Nanyun and Wang, Wei}, booktitle = {Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI)}, year = {2024} }
ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning
Rujun Han, I.-Hung Hsu, Jiao Sun, Julia Baylon, Qiang Ning, Dan Roth, and Nanyun Peng, in The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Full Text Code Abstract BibTeX DetailsUnderstanding how events are semantically related to each other is the essence of reading comprehension. Recent event-centric reading comprehension datasets focus mostly on event arguments or temporal relations. While these tasks partially evaluate machines’ ability of narrative understanding, human-like reading comprehension requires the capability to process event-based information beyond arguments and temporal reasoning. For example, to understand causality between events, we need to infer motivation or purpose; to establish event hierarchy, we need to understand the composition of events. To facilitate these tasks, we introduce ESTER, a comprehensive machine reading comprehension (MRC) dataset for Event Semantic Relation Reasoning. The dataset leverages natural language queries to reason about the five most common event semantic relations, provides more than 6K questions, and captures 10.1K event relation pairs. Experimental results show that the current SOTA systems achieve 22.1%, 63.3% and 83.5% for token-based exact-match (EM), F1 and event-based HIT@1 scores, which are all significantly below human performances (36.0%, 79.6%, 100% respectively), highlighting our dataset as a challenging benchmark.
@inproceedings{han2021ester, title = {ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning}, author = {Han, Rujun and Hsu, I-Hung and Sun, Jiao and Baylon, Julia and Ning, Qiang and Roth, Dan and Peng, Nanyun}, booktitle = {The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2021} }
ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning
Rujun Han, Xiang Ren, and Nanyun Peng, in The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Full Text Code Abstract BibTeX DetailsWhile pre-trained language models (PTLMs) have achieved noticeable success on many NLP tasks, they still struggle for tasks that require event temporal reasoning, which is essential for event-centric applications. We present a continual pre-training approach that equips PTLMs with targeted knowledge about event temporal relations. We design self-supervised learning objectives to recover masked-out event and temporal indicators and to discriminate sentences from their corrupted counterparts (where event or temporal indicators got replaced). By further pre-training a PTLM with these objectives jointly, we reinforce its attention to event and temporal information, yielding enhanced capability on event temporal reasoning. This Effective CONtinual pre-training framework for Event Temporal reasoning (ECONET) improves the PTLMs’ fine-tuning performances across five relation extraction and question answering tasks and achieves new or on-par state-of-the-art performances in most of our downstream tasks.
@inproceedings{han2021econet, title = {ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning}, author = {Han, Rujun and Ren, Xiang and Peng, Nanyun}, booktitle = {The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2021} }
EventPlus: A Temporal Event Understanding Pipeline
Mingyu Derek Ma, Jiao Sun, Mu Yang, Kung-Hsiang Huang, Nuan Wen, Shikhar Singh, Rujun Han, and Nanyun Peng, in 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Demonstrations Track, 2021.
Full Text Slides Poster Video Code Abstract BibTeX DetailsWe present EventPlus, a temporal event understanding pipeline that integrates various state-of-the-art event understanding components including event trigger and type detection, event argument detection, event duration and temporal relation extraction. Event information, especially event temporal knowledge, is a type of common sense knowledge that helps people understand how stories evolve and provides predictive hints for future events. EventPlus as the first comprehensive temporal event understanding pipeline provides a convenient tool for users to quickly obtain annotations about events and their temporal information for any user-provided document. Furthermore, we show EventPlus can be easily adapted to other domains (e.g., biomedical domain). We make EventPlus publicly available to facilitate event-related information extraction and downstream applications.
@inproceedings{ma2021eventplus, title = {EventPlus: A Temporal Event Understanding Pipeline}, author = {Ma, Mingyu Derek and Sun, Jiao and Yang, Mu and Huang, Kung-Hsiang and Wen, Nuan and Singh, Shikhar and Han, Rujun and Peng, Nanyun}, booktitle = {2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Demonstrations Track}, year = {2021} }
Domain Knowledge Empowered Structured Neural Net for End-to-End Event Temporal Relation Extraction
Rujun Han, Yichao Zhou, and Nanyun Peng, in the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Full Text Slides Code Abstract BibTeX DetailsExtracting event temporal relations is a critical task for information extraction and plays an important role in natural language understanding. Prior systems leverage deep learning and pre-trained language models to improve the performance of the task. However, these systems often suffer from two shortcomings: 1) when performing maximum a posteriori (MAP) inference based on neural models, previous systems only used structured knowledge that is assumed to be absolutely correct, i.e., hard constraints; 2) biased predictions on dominant temporal relations when training with a limited amount of data. To address these issues, we propose a framework that enhances deep neural network with distributional constraints constructed by probabilistic domain knowledge. We solve the constrained inference problem via Lagrangian Relaxation and apply it to end-to-end event temporal relation extraction tasks. Experimental results show our framework is able to improve the baseline neural network models with strong statistical significance on two widely used datasets in news and clinical domains.
@inproceedings{han2020knowledge, title = {Domain Knowledge Empowered Structured Neural Net for End-to-End Event Temporal Relation Extraction}, author = {Han, Rujun and Zhou, Yichao and Peng, Nanyun}, booktitle = {the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, publisher = {Association for Computational Linguistics}, pages = {5717--5729}, slideslive_id = {38939236}, year = {2020} }
TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions
Qiang Ning, Hao Wu, Rujun Han, Nanyun Peng, Matt Gardner, and Dan Roth, in the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
Full Text Code Abstract BibTeX DetailsA critical part of reading is being able to understand the temporal relationships between events described in a passage of text, even when those relationships are not explicitly stated. However, current machine reading comprehension benchmarks have practically no questions that test temporal phenomena, so systems trained on these benchmarks have no capacity to answer questions such as "what happened before/after [some event]?" We introduce TORQUE, a new English reading comprehension benchmark built on 3.2k news snippets with 21k human-generated questions querying temporal relationships. Results show that RoBERTa-large achieves an exact-match score of 51% on the test set of TORQUE, about 30% behind human performance.
@inproceedings{ning2020torque, title = {TORQUE: A Reading Comprehension Dataset of Temporal Ordering Questions}, author = {Ning, Qiang and Wu, Hao and Han, Rujun and Peng, Nanyun and Gardner, Matt and Roth, Dan}, booktitle = {the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, publisher = {Association for Computational Linguistics}, pages = {1158--1172}, slideslive_id = {38938807}, year = {2020} }
Joint Event and Temporal Relation Extraction with Shared Representations and Structured Prediction
Rujun Han, Qiang Ning, and Nanyun Peng, in 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
Full Text Poster Code BibTeX Details@inproceedings{han2019joint, title = {Joint Event and Temporal Relation Extraction with Shared Representations and Structured Prediction}, author = {Han, Rujun and Ning, Qiang and Peng, Nanyun}, booktitle = {2019 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2019} }
Deep Structured Neural Network for Event Temporal Relation Extraction
Rujun Han, I.-Hung Hsu, Mu Yang, Aram Galstyan, Ralph Weischedel, and Nanyun Peng, in The 2019 SIGNLL Conference on Computational Natural Language Learning (CoNLL), 2019.
Full Text Code BibTeX Details@inproceedings{han2019deep, title = {Deep Structured Neural Network for Event Temporal Relation Extraction}, author = {Han, Rujun and Hsu, I-Hung and Yang, Mu and Galstyan, Aram and Weischedel, Ralph and Peng, Nanyun}, booktitle = {The 2019 SIGNLL Conference on Computational Natural Language Learning (CoNLL)}, year = {2019} }
Contextualized Word Embeddings Enhanced Event Temporal Relation Extraction for Story Understanding
Rujun Han, Mengyue Liang, Bashar Alhafni, and Nanyun Peng, in 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019), Workshop on Narrative Understanding, 2019.
Full Text BibTeX Details@inproceedings{han2019contextualized, title = {Contextualized Word Embeddings Enhanced Event Temporal Relation Extraction for Story Understanding}, author = {Han, Rujun and Liang, Mengyue and Alhafni, Bashar and Peng, Nanyun}, booktitle = {2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019), Workshop on Narrative Understanding}, year = {2019} }