Artificial intelligence chatbots are failing to provide reliable medical information, according to a recent audit published in the journal BMJ Open. Researchers evaluated the performance of these large language models by submitting 10 specific questions across five categories: cancer, vaccines, stem cells, nutrition, and athletic performance.
The findings indicate that 49.6% of the responses generated by the chatbots were classified as problematic. Specifically, 30% of the answers were labeled as "somewhat problematic," while 19.6% were categorized as "highly problematic," containing significant inaccuracies or fabricated data.
As reported by PsyPost, the rapid integration of AI into the medical sector is currently outpacing the technology's ability to maintain factual accuracy. While these models are increasingly being utilized to assist clinicians with documentation, decision-making, and patient education, the study highlights a persistent, fundamental flaw in their architecture.
Large language models are prone to "hallucinations," a phenomenon where the system generates confident but entirely false information. Because these models are engineered to communicate in natural, human-like language, users often struggle to distinguish between expert medical guidance and generated misinformation.
While previous research has suggested that AI can occasionally outperform human experts in forecasting specific experimental outcomes, this audit demonstrates that such capabilities do not translate to general medical reliability. The high rate of error across common health topics presents a significant challenge for public health, particularly as more individuals turn to chatbots for immediate answers to serious health concerns.
Researchers emphasized that despite the potential for these tools to benefit the medical field, the current generation of models frequently propagates misinformation. The findings underscore a growing gap between the rapid adoption of AI technology and the implementation of necessary safety protocols required to protect users from receiving dangerous or inaccurate health guidance.