Share this page:

ARMADA: Attribute-Based Multimodal Data Augmentation

Xiaomeng Jin, Jeonghwan Kim, Yu Zhou, Kuan-Hao Huang, Te-Lin Wu, Nanyun Peng, and Heng Ji, in Workshop on WikiNLP: Advancing Natural Language Process for Wikipedia at The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.

Download the full text


Abstract

Manual curation of high-quality image–text pairs for multimodal language models is expensive. ARMADA augments such data by (1) extracting entities and their visual attributes from text, (2) substituting those attributes with KB- and LLM-guided alternatives, and (3) editing the original image accordingly. The resulting knowledge-grounded, semantically consistent pairs boost model performance on four downstream tasks, demonstrating the value of attribute-level, KB-aware augmentation.


Bib Entry

@inproceedings{jin2024armada,
  author = {Jin, Xiaomeng and Kim, Jeonghwan and Zhou, Yu and Huang, Kuan-Hao and Wu, Te-Lin and Peng, Nanyun and Ji, Heng},
  title = {ARMADA: Attribute-Based Multimodal Data Augmentation},
  booktitle = {Workshop on WikiNLP: Advancing Natural Language Process for Wikipedia at The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2024}
}

Related Publications