The UK government’s plan to leverage a National Data Library (NDL) to support AI development is facing significant implementation challenges, according to new research from the Open Data Institute (ODI).
Confirmed in the 2024 Autumn Budget with a £100 million funding commitment, the NDL aims to consolidate public sector data to drive AI-powered innovation for researchers and businesses. However, after building an 'NDL-Lite' prototype, the ODI found that current public data quality is far from meeting the standards required for AI applications.
Data Usability and the Risk of AI 'Workarounds'
ODI researchers processed 38GB of data from six public sector bodies, integrating over 100,000 files. The experiment revealed that many datasets on platforms like data.gov.uk suffer from mislabeling, obsolescence, or missing metadata. For instance, some datasets labeled as 'crime' were actually incompatible local statistical reports, rendering them useless for effective cross-regional AI analysis.
More concerning is the lack of updates for core datasets. The ODI noted that a key Home Office crime dataset has not been updated since 2018 and is inaccessible via the Office for National Statistics (ONS) API. When AI agents cannot retrieve authoritative data from official channels, they often pivot to news reports or commercial sources, where accuracy is far from guaranteed.
Professor Elena Simperl, Director of Research at the ODI, stated that the study highlights the stark gap between the volume of public data and its actual usability. In an interview, she warned that if official data fails to provide the necessary support, AI agents will simply bypass official channels in favor of alternative information sources.
While the Department for Science, Innovation and Technology (DSIT) claims to have completed a large-scale discovery phase aimed at paving the way for systemic public sector reform, the ODI’s findings suggest that government investment alone is not enough. Cleaning and standardizing existing data must be the top priority.
Current test results suggest that unless the government significantly improves the accuracy and structure of its data, the NDL project risks becoming a 'data graveyard' that AI systems cannot effectively utilize.