PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment

CVPR 2026 Findings

Yantao Li1,2,3 Qiang Hui2,3 Chenyang Yan2,3 Kanzhi Cheng2,3 Fang Zhao2,3 Chao Tan2,3 Huanling Gao2,3 Jianbing Zhang2,3 Kai Wang2,3 Xinyu Dai1 Shiguo Lian2,3
1National Key Laboratory for Novel Software Technology, Nanjing University,  2Data Science & Artificial Intelligence Research Institute, China Unicom,  3Unicom Data Intelligence, China Unicom
Process-Aligned Multimodal Reasoning

Faithful visual reasoning should get both the answer and the evidence right.

PaLMR addresses process hallucinations where multimodal models arrive at correct final answers while misperceiving the visual evidence used in their reasoning chains.

Abstract

Reinforcement learning has recently improved the reasoning ability of Large Language Models and Multimodal LLMs, yet prevailing reward designs emphasise final-answer correctness and consequently tolerate process hallucinations, cases where models reach the right answer while misperceiving visual evidence. We address this process-level misalignment with PaLMR, a framework that aligns not only outcomes but also the reasoning process itself.

PaLMR comprises two complementary components: a perception-aligned data layer that constructs process-aware reasoning data with structured pseudo-ground-truths and verifiable visual facts, and a process-aligned optimisation layer that constructs a hierarchical reward fusion scheme with a process-aware scoring function to encourage visually faithful chains-of-thought and improve training stability.

Method Overview

PaLMR introduces multimodal process alignment for visual reasoning. Instead of rewarding only final-answer correctness, it makes visual facts and intermediate reasoning steps explicit, then optimizes the model toward chains of thought that remain faithful to the image.

1

Perception-Aligned Data Layer

Builds process-aware reasoning data with structured pseudo-ground-truths and verifiable visual facts, giving supervision to the visual evidence in reasoning process.

2

Process-Aligned Optimization Layer

Uses hierarchical reward fusion and process-aware scoring to encourage visually faithful reasoning traces while maintaining reinforcement learning stability.

Overview of the PaLMR framework
Overview of PaLMR, including perception-aligned data construction and process-aligned optimization.

Key Highlights

  • Targets process hallucination: reduces cases where the final answer is correct but the reasoning process contradicts visual evidence.
  • Enhances human evaluation alignment: replaces point-wise scoring with pairwise comparisons to achieve a significantly higher human alignment ratio(>80%). This provides a robust, human-aligned signal for optimizing perceptual grounding.
  • Ensures training stability: employs a hierarchical reward fusion scheme that requires coherent visual perception before rewarding the final answer. This rigorous gating mechanism maintains stable, monotonically increasing accuracy and reasoning stability.
  • Aligns evidence and reasoning: connects structured visual facts with chains of thought through process-level supervision.

Results

Experiments on Qwen2.5-VL-7B show that PaLMR substantially reduces reasoning hallucinations and improves visual reasoning fidelity across multimodal reasoning benchmarks.

Model #Data MMMUval HallusionBench MathVerseVision Only MMStar MathVista
GPT-4o - 60.0 68.0 - - 63.8
Gemini2-Flash - 70.6 69.4 - - 70.4
Qwen2.5-VL-72B - 68.2 71.4 - 70.8 74.8
Qwen2.5-VL-32B - 63.7 72.1 54.3 67.3 74.7
InternVL2.5-8B - 56.2 67.4 - 62.9 64.4
MM-Eureka-7B 15K 55.4 69.5 46.6 64.6 73.0
OpenVLThinker-7B 12K 56.3 66.9 40.4 62.1 70.2
Perception-R1-7B 2K 56.3 70.0 46.1 66.3 73.6
Qwen2.5-VL-7B - 56.4 63.8 42.6 64.3 68.2
+ GRPO 4.7K 57.8 66.7 45.9 66.0 74.1
PaLMR-7B 4.7K 59.3 70.9 47.5 67.1 73.8

Qualitative Results

BibTeX

@inproceedings{li2026palmr,
  title     = {PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment},
  author    = {Yantao Li and Qiang Hui and Chenyang Yan and Kanzhi Cheng and Fang Zhao and Chao Tan and Huanling Gao and Jianbing Zhang and Kai Wang and Xinyu Dai and Shiguo Lian},
  booktitle = {CVPR 2026 Findings},
  year      = {2026},
  url       = {https://arxiv.org/abs/2603.06652},
  doi       = {10.48550/arXiv.2603.06652}
}