ARMADA: Attribute-Based Multimodal Data Augmentation
Xiaomeng Jin, Jeonghwan Kim, Yu Zhou, Kuan-Hao Huang, Te-Lin Wu, Nanyun Peng, and Heng Ji, in Workshop on WikiNLP: Advancing Natural Language Process for Wikipedia at The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
Download the full text
Abstract
Manual curation of high-quality image–text pairs for multimodal language models is expensive. ARMADA augments such data by (1) extracting entities and their visual attributes from text, (2) substituting those attributes with KB- and LLM-guided alternatives, and (3) editing the original image accordingly. The resulting knowledge-grounded, semantically consistent pairs boost model performance on four downstream tasks, demonstrating the value of attribute-level, KB-aware augmentation.
Bib Entry
@inproceedings{jin2024armada, author = {Jin, Xiaomeng and Kim, Jeonghwan and Zhou, Yu and Huang, Kuan-Hao and Wu, Te-Lin and Peng, Nanyun and Ji, Heng}, title = {ARMADA: Attribute-Based Multimodal Data Augmentation}, booktitle = {Workshop on WikiNLP: Advancing Natural Language Process for Wikipedia at The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, year = {2024} }