Submission Guide

Official leaderboard submissions are managed in GitHub Issues: github.com/benluwang/MedThinkVQA/issues

Reminder: MedThinkVQA data are licensed under CC BY-NC-SA 4.0 for research and education only, not for clinical use. The current leaderboard uses the released test split.

Three-Step Workflow

Step 1

Evaluate

Download the released test split from Hugging Face and prepare predictions in the required JSONL format.

Step 2

Format

Keep all original sample fields unchanged and append model outputs in required fields.

Step 3

Submit

Create a GitHub Issue, attach prediction JSONL and any external metrics summary you computed, then provide model metadata and link.

Result File Requirements

prediction (required): final option label, one of "A" to "E".
prediction_rationale (recommended): reasoning trace produced during inference.
prediction_twi (optional): structured TwI step outputs for richer analysis.

Use a clear file name, for example medthinkvqa_test_submission.jsonl.

Example JSON Object

{
  "title": "Case number 19254",
  "CLINICAL_HISTORY": "...",
  "image_count": 2,
  "image_01_id": "abc123",
  "image_01_path": "images/case19254/abc123.jpg",
  "image_01_modality": "MRI",
  "image_01_sub_modality": "Conventional MRI",
  "image_02_id": "def456",
  "image_02_path": "images/case19254/def456.jpg",
  "image_02_modality": "CT",
  "image_02_sub_modality": "HRCT / Thin-slice CT",
  "options": {"A": "...", "B": "...", "C": "...", "D": "...", "E": "..."},
  "correct_answer": "C",
  "prediction": "C",
  "prediction_rationale": "...",
  "prediction_twi": {
    "step1_per_image_findings": ["..."],
    "step2_integrated_summary": "...",
    "step3_differential_reasoning": "..."
  }
}

Submission Checklist

Create a submission issue in the MedThinkVQA repository.
Attach prediction JSONL and, if available, a short metrics summary from your own evaluation pipeline.
Include model name, organization, date, open-source status, and reference link.

After verification, entries are merged into docs/benchmark.json and reflected on the leaderboard site.