The UK government’s AI Security Institute (AISI) has released an initial evaluation of Anthropic's Mythos Preview model, providing independent verification of its cybersecurity capabilities.
While the model's performance on individual security tasks mirrors recent frontier models, the findings highlight a specific strength in executing multi-step attack sequences.
Anthropic recently restricted the initial release of Mythos Preview to a limited group of critical industry partners. The company previously described the model as "strikingly capable at computer security tasks."
Advanced attack chaining
AISI's testing shows that Mythos does not significantly outperform other recent models when performing isolated cybersecurity-related tasks. Competitors like GPT-5.4, Opus 4.6, and Codex 5.3 showed comparable results within a 5 to 10 percent accuracy range across multiple difficulty levels.
However, Mythos demonstrated superior potential in a specialized test range known as "The Last Ones" (TLO). This test simulates a 32-step data extraction attack across a corporate network.
This specific evaluation requires the model to chain dozens of steps together across multiple hosts and network segments. The AISI estimates that such a sustained operation would take a trained human professional approximately 20 hours to complete.
Since early 2023, AISI has utilized Capture the Flag (CTF) challenges to measure model progress. While GPT-3.5 Turbo struggled with basic "Apprentice" tasks, Mythos Preview can now complete over 85 percent of those same low-level challenges.