FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

Xuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A. Sigurdsson, Nanyun Peng, and Xin Eric Wang, Transactions on Machine Learning Research (TMLR), 2025.

Download the full text

Abstract

Controllable text-to-image (T2I) diffusion aims to respect both a text prompt and auxiliary semantic inputs (e.g. edge maps). Existing methods struggle when presented with \emphmultiple heterogeneous controls, incurring heavy computational cost and reduced fidelity. \textbfFlexEControl introduces a weight-decomposition strategy that unifies diverse controls with far fewer parameters. The approach trims trainable parameters by 41%, lowers memory by 30% relative to Uni-ControlNet, doubles data efficiency, and faithfully integrates multiple conditions of varying modality—all while keeping generation quality high.

Bib Entry

@article{he2025flexecontrol,
  author = {He, Xuehai and Zheng, Jian and Fang, Jacob Zhiyuan and Piramuthu, Robinson and Bansal, Mohit and Ordonez, Vicente and Sigurdsson, Gunnar A and Peng, Nanyun and Wang, Xin Eric},
  title = {FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation},
  journal = {Transactions on Machine Learning Research (TMLR)},
  year = {2025}
}