VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

Xueqing Wu, Yuheng Ding, Bingxuan Li, Pan Lu, Da Yin, Kai-Wei Chang, and Nanyun Peng, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025.

Download the full text

Abstract

The ability of large vision-language models (LVLMs) to critique and correct their reasoning is an essential building block towards self-improvement. However, a systematic analysis of such capabilities in LVLMs is still lacking. We propose VISCO, the first benchmark to extensively analyze fine-grained critique and correction. VISCO requires LVLMs to judge the correctness of \empheach step in a chain-of-thought and justify their decisions. Evaluating 24 LVLMs shows that human-written critiques markedly boost performance, whereas model-generated critiques can be unreliable. We identify three common failure patterns—poor visual-perception critique, reluctance to “say no,” and exaggerated error propagation—and introduce a \emphLookBack strategy that revisits the image to verify every claim, improving critique & correction accuracy by up to 13.5%.

Bib Entry

@inproceedings{wu2025visco,
  author = {Wu, Xueqing and Ding, Yuheng and Li, Bingxuan and Lu, Pan and Yin, Da and Chang, Kai{-}Wei and Peng, Nanyun},
  title = {VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2025}
}