It just got easier to make an archival copy of that website you’re digging into

Aug 13, 2025 - 23:00
 0  0

It’s a tough time to be interested in archiving the web — arguably the most transient medium for news since the birth of radio. On Monday, Reddit announced it would begin blocking the Internet Archive’s Wayback Machine from crawling the site to make an archival copy. Wayback is the closest thing we have — as journalists, researchers, and human beings — to a permanent record of the web, at a time when up to two-thirds of all links posted online since 2013 no longer work. Despite knowing for decades that cool URLs don’t change, the churn of publishers, domain names, content management systems, and URL shorteners have all contributed to make link rot an epidemic — and resources like the Wayback Machine essential.

Reddit’s reasoning isn’t really about anything the Internet Archive has done. It’s that Reddit has decided to prevent tech companies from crawling its site to either show up in their search engines or train their large language models; now those companies must pay for the privilege. And since the Wayback Machine contains what amounts to an imperfect copy of Reddit, apparently one or more AI labs had decided to just crawl that version of the site instead of the real thing. So Reddit’s blocking the Internet Archive. Worryingly, it’s not hard to imagine other publishers following suit with the same reasoning, which could radically reduce our knowledge of the web that was.

But the Internet Archive isn’t the only way to keep and research a copy of a website. Today Bellingcat, the OSINT-fueled news organization, announced a new version of its tool Auto Archiver, which it describes as “a tool aimed at preserving online digital content before it can be modified, deleted or taken down” — a critical need for any reporting on our lives online.

Publicly launched in 2022, it has preserved over 150,000 web pages and social media posts to date. The Auto Archiver has been used by Bellingcat’s journalists to preserve information on dozens of fast moving events such as the Jan. 6 riots — when we first used the tool internally — as well as gather digital evidence for our Justice and Accountability project and to monitor Civilian Harm in Ukraine.

The Auto Archiver has also been adopted by both large newsrooms and NGOs. It has been used by individual researchers, journalists, activists, archivists, academics and developers as well. With interest in the tool strong, we have worked hard to add to and improve it over time. But we have used the past few months to take a step back and to build a new and more robust ecosystem to further help individual organisations and researchers use and benefit from it.

The new features aim, in part, at making the tool more useful for journalists who don’t live on the command line: a “user-friendly interface,” better setups for newsroom teams, easier configuration, more documentation.

If you work in a newsroom or research team and want to access a demo or help to deploy the Auto Archiver internally you can reach us at contact-tech@bellingcat.com with the subject “Auto Archiver at [my team/organisation]” and tell us more about your organisation and archiving needs. Building a greater adoption base is the best way to ensure the future of this tool and its versatility.

Photo of Stuttgart Library by Niklas Ohlrogge (niamoh.de).

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0