xiand.ai
Apr 17, 2026 · Updated 04:58 AM UTC
AI

Major news outlets block Wayback Machine to prevent AI training

Twenty-three prominent news organizations, including The New York Times and USA Today, have begun blocking the Internet Archive's web crawler to protect content from AI scrapers.

Alex Chen

1 min read

Major news organizations are actively blocking the Internet Archive’s Wayback Machine to prevent their content from being used to train artificial intelligence models.

An analysis by Originality AI reveals that 23 major news sites now block 'ia_archiverbot,' the Wayback Machine’s web crawler. The crackdown includes high-profile outlets such as The New York Times and Reddit.

USA Today Co. has implemented blocks across its network of more than 200 media outlets. The Guardian is also restricting access, allowing crawling but filtering archived content from public view, which creates digital dead ends for researchers.

Publishers cite copyright and competition concerns

Publishers claim these measures are necessary to stop AI companies from scraping archives to build competing products. The New York Times stated that archived content is being used "to directly compete with us," though the company did not provide specific evidence of copyright violations.

USA Today Co. describes its actions as routine bot prevention. However, the move removes a primary tool used by journalists to verify historical accuracy and track editorial changes.

For three decades, the Wayback Machine has preserved over a trillion web pages. The current wave of blocking threatens the long-term accessibility of the public web as publishers prioritize protecting intellectual property from large language model developers.

Comments

Comments are stored locally in your browser.