VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu, Yuheng Ding, Bingxuan Li, Pan Lu, Da Yin, Kai-Wei Chang, and Nanyun Peng, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
Download the full text
Abstract
The ability of large vision-language models (LVLMs) to critique and correct their reasoning is an essential building block towards self-improvement. However, a systematic analysis of such capabilities in LVLMs is still lacking. We propose VISCO, the first benchmark to extensively analyze fine-grained critique and correction. VISCO requires LVLMs to judge the correctness of \empheach step in a chain-of-thought and justify their decisions. Evaluating 24 LVLMs shows that human-written critiques markedly boost performance, whereas model-generated critiques can be unreliable. We identify three common failure patterns—poor visual-perception critique, reluctance to “say no,” and exaggerated error propagation—and introduce a \emphLookBack strategy that revisits the image to verify every claim, improving critique & correction accuracy by up to 13.5%.
Bib Entry
@inproceedings{wu2025visco, author = {Wu, Xueqing and Ding, Yuheng and Li, Bingxuan and Lu, Pan and Yin, Da and Chang, Kai{-}Wei and Peng, Nanyun}, title = {VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2025} }