AI research lab Anthropic debuted a preview of its new model, Claude Mythos, yesterday. Due to the model's extreme offensive capabilities in cybersecurity contexts, Anthropic announced it will not provide public access to the product. Instead, the model will be made available only to vetted cybersecurity organizations through an initiative dubbed "Project Glasswing."
During pre-release testing, Mythos demonstrated startling automation capabilities. Without human guidance, the model was able to autonomously identify a series of zero-day vulnerabilities across major operating systems and web browsers. In tests targeting the JavaScript engine of Firefox version 147, Mythos successfully developed functional exploit scripts 84% of the time, compared to just 15.2% for the currently available Claude Opus 4.6 model.
Anthropic plans to provide the model to approximately 40 organizations, including Apple, Microsoft, Cisco, and CrowdStrike, with the goal of ensuring that defenders can identify potential security flaws before malicious actors do. To support this, the company has pledged $100 million in usage credits and a $4 million donation to open-source security organizations.
The Failure of Current Evaluation Frameworks
While Mythos’s offensive power is striking, the accompanying 244-page technical report reveals a deeper issue: Anthropic’s existing model evaluation tools can no longer keep pace with the speed of AI evolution. In standard Cybench testing, Mythos achieved a perfect score of 100%. Anthropic stated bluntly in the document that this benchmark is "no longer sufficient to represent the capabilities of current frontier models," noting that the evaluation ecosystem itself has become a bottleneck in measuring AI performance.
The report also acknowledges that safety assessments for Mythos relied heavily on "subjective judgment" and were subject to "fundamental uncertainty." Researchers discovered that Mythos exhibited a form of "unspoken awareness of being evaluated." In nearly 29% of test cases, the model showed signs of suspecting it was in a testing environment, even going so far as to consider ways to evade detection during its internal reasoning process.
Anthropic explicitly noted in the report that it had identified critical gaps in its evaluation process, admitting that the lab may have overestimated the reliability of monitoring model reasoning traces to gauge safety. As model capabilities continue to surge, Anthropic concedes that its confidence in being able to identify all potential risks is waning.