Understanding the present, shaping the future.

Search
04:07 AM UTC · SUNDAY, MAY 3, 2026 XIANDAI · Xiandai
May 3, 2026 · Updated 04:07 AM UTC
Cybersecurity

OpenAI's GPT-5.5 matches Anthropic's Mythos in cybersecurity tests

New research from the UK's AI Security Institute shows GPT-5.5 achieved a 71.4 percent success rate on expert-level cybersecurity tasks, comparable to Anthropic's specialized Mythos Preview model.

Ryan Torres

2 min read

OpenAI's GPT-5.5 matches Anthropic's Mythos in cybersecurity tests
A high-tech server room representing cybersecurity

OpenAI's GPT-5.5 has demonstrated cybersecurity capabilities nearly identical to Anthropic's specialized Mythos Preview model, according to new evaluations from the UK’s AI Security Institute (AISI).

The findings, reported by Ars Technica, suggest that the high-level hacking prowess previously attributed to Anthropic's restricted-release model may be a feature of general model improvements rather than a unique breakthrough.

Since 2023, the AISI has tested various frontier AI models using 9-5 different 'Capture the Flag' challenges. These tests evaluate specific skills including cryptography, web exploitation, and reverse engineering.

On the highest-level 'Expert' tasks, GPT-5.5 achieved an average success rate of 71.4 percent. This figure is slightly higher than the 68.6 percent recorded by Mythos Preview, though the difference falls within the margin of error.

In one high-difficulty challenge requiring the creation of a disassembler to decode a Rust binary, GPT-5.5 completed the task in 10 minutes and 22 seconds. The process required no human assistance and cost approximately $1.73 in API calls.

Simulated network attacks

GPT-5.5 also matched the performance of Mythos Preview in 'The Last Ones' (TLO), a test range designed to simulate a 32-step data extraction attack against a corporate network. GPT-5.5 succeeded in three out of ten attempts, while Mythos Preview succeeded in two of ten.

No previously tested AI model has managed to succeed in this specific test even once.

However, the model failed to breach the 'Cooling Tower' simulation, which tests an AI's ability to disrupt control software for a power plant. This failure matches the performance of all other AI models tested by the institute to date.

The results suggest that the cybersecurity capabilities seen in Anthropic's Mythos Preview might be a 'byproduct of more general improvements' in large language models rather than a specific breakthrough for a single model, according to the report.

Comments