PG-Story: Taxonomy, Dataset, and Evaluation for Ensuring Child-Safe Content for Story Generation
Alicia Y. Tsai, Shereen Oraby, Anjali Narayan-Chen, Alessandra Cervone, Spandana Gella, Apurv Verma, Tagyoung Chung, Jing Huang, and Nanyun Peng, in 4th Workshop on NLP for Positive Impact at The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
🏆 Outstanding Paper Award
Download the full text
Abstract
Children’s story generation systems must be both engaging and age-appropriate, yet existing language-model–based systems often produce violent, profane, or biased content. PG-Story introduces (i) a safety-focused taxonomy tailored to children’s text, and (ii) a dataset annotated at sentence and discourse level for unsafe elements. Using PG-Story, the authors show how self-diagnosis plus controllable decoding can markedly reduce unsafe content in generated stories.
Bib Entry
@inproceedings{tsai2024pgstory, author = {Tsai, Alicia Y. and Oraby, Shereen and Narayan-Chen, Anjali and Cervone, Alessandra and Gella, Spandana and Verma, Apurv and Chung, Tagyoung and Huang, Jing and Peng, Nanyun}, title = {PG-Story: Taxonomy, Dataset, and Evaluation for Ensuring Child-Safe Content for Story Generation}, booktitle = {4th Workshop on NLP for Positive Impact at The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2024} }