An experimental attempt to use Anthropic's Claude AI to pilot a flight simulator resulted in two crashes during a simulated flight from Hainan to Qionghai Bo'ao, according to a report from so.long.thanks.fish.
The test involved instructing the AI model to interface with the X-Plane 12 API to fly a Cessna 172. While the model successfully generated Python scripts to manage takeoff and flight controls, it struggled with real-time synchronization and latency issues.
The first crash occurred shortly after takeoff. The pilot's log noted that the AI's flight controller applied excessive elevator gain without proper damping, causing a massive pitch-over and roll that forced a reset to the runway.
In a second attempt, the model achieved stable flight and even successfully navigated a downwind leg. However, a second crash occurred during the final approach when a gap in the AI's processing loop left the aircraft without active control.
A benchmark for reasoning
The experimenter, publishing via so.long.thanks.fish, noted that the primary challenge was the delay between the AI's visual screenshots and the API data. This latency made it difficult for the model to adjust course quickly enough during critical maneuvers.
Beyond the technical failures, the experiment serves as a test of the model's planning capabilities. The researcher observed that the AI decided to write code for takeoff before even developing instructions for steering or landing.
"I figure this is some kind of AGI benchmark for models thinking ahead and planning what tools to develop and how to use them _before take off_," the source wrote.
The session concluded with the recorded tally of two crashes and one stable flight, highlighting the current difficulty for large language models in managing high-frequency, real-time physics environments.