Medical Thinking with Multiple Images

MedThinkVQA is a multi-image radiology reasoning dataset focused on evidence extraction, cross-view synthesis, and diagnostic quality.

Explore Leaderboard Overview Code Data Submit

Status: Train and test splits are available on Hugging Face.
This dataset is for research/education only (CC BY-NC-SA 4.0) and is not for clinical use.

SourceBenchmark

Best Model-

Entries-

Last Updated (UTC)-

News

[2026-04-05] Train and test splits were released on Hugging Face.
[2026-04-05] Public dataset link updated to the canonical Hugging Face page.
[2026-03-04] Leaderboard and submission flow are live.

Leaderboard

Search Model

Mode

Reasoning Effort

Default view shows all benchmark ACC scores. Scroll vertically to browse all rows.

About

MedThinkVQA evaluates multi-image diagnostic reasoning quality with a focus on final diagnostic accuracy.

Data are adapted from Eurorad (European Society of Radiology) and released under CC BY-NC-SA 4.0 for research/education only. Not for clinical use.