Evaluate
Download the released test split from Hugging Face and prepare predictions in the required JSONL format.
Official leaderboard submissions are managed in GitHub Issues: github.com/benluwang/MedThinkVQA/issues
Download the released test split from Hugging Face and prepare predictions in the required JSONL format.
Keep all original sample fields unchanged and append model outputs in required fields.
Create a GitHub Issue, attach prediction JSONL and any external metrics summary you computed, then provide model metadata and link.
prediction (required): final option label, one of "A" to "E".prediction_rationale (recommended): reasoning trace produced during inference.prediction_twi (optional): structured TwI step outputs for richer analysis.Use a clear file name, for example medthinkvqa_test_submission.jsonl.
{
"title": "Case number 19254",
"CLINICAL_HISTORY": "...",
"image_count": 2,
"image_01_id": "abc123",
"image_01_path": "images/case19254/abc123.jpg",
"image_01_modality": "MRI",
"image_01_sub_modality": "Conventional MRI",
"image_02_id": "def456",
"image_02_path": "images/case19254/def456.jpg",
"image_02_modality": "CT",
"image_02_sub_modality": "HRCT / Thin-slice CT",
"options": {"A": "...", "B": "...", "C": "...", "D": "...", "E": "..."},
"correct_answer": "C",
"prediction": "C",
"prediction_rationale": "...",
"prediction_twi": {
"step1_per_image_findings": ["..."],
"step2_integrated_summary": "...",
"step3_differential_reasoning": "..."
}
}
After verification, entries are merged into docs/benchmark.json and reflected on the leaderboard site.