News orgs block the Internet Archive's crawlers—and undermine the truth
Nina Jankowicz at The Wayfinder:
The first time I learned about the Internet Archive and its Wayback Machine—a library of over one trillion snapshots of webpages since the earliest days of the net—was in my high school AP US Government class. It was the early 2000s, and Mr. Fenster, our teacher, was showing us how to access a slice of history. Though I’ve long forgotten the exact website we were looking at—an early version of nytimes.com, maybe?—I remember being bemused at how the internet used to look. Today, I rely on the Internet Archive for my research. I have a Wayback Machine sticker on my laptop. The Archive has helped me understand the impact malicious lies might have on my safety and allowed me to track changing narratives from both state-backed actors and independent grifters. It has tried to equalize access to information. But today, the Internet Archive’s—and by extension, the world’s—access to 23 major news sites is under threat. It’s not because they are worried about paywall hopping. Their decision is motivated by something much dumber: the AI industry’s insatiable appetite for information.
What is the Internet Archive?
The Internet Archive is best known for the Wayback Machine, which allows anyone to take a snapshot of what a webpage looked like at a given time, preserving that moment for the future. If a news outlet issues a correction or a politician makes a surreptitious edit to their website, the Archive provides a definitive record of a website at a specific moment, so long as someone—or an automated crawler—archives it. The Archive’s founder, Brewster Kahle, says his goal in founding the Archive in 1996 was “to be a record of what happened so that people can’t rewrite history.” That goal has become a pressing need under the second Trump Administration, which has worked to suppress inconvenient facts and censor information it doesn’t like.
Less well known are the Archive’s libraries of 2 million books and 3 million hours of television, including an archive of state-run propaganda from the likes of Russia...and corporate propaganda from the likes of Fox News. The latter helped me understand the extent to which Fox had lied about me in 2022-2023. Since I don’t watch much TV, let alone keep Fox on all day and night, I needed a way to understand how often Fox had talked about me. The Archive provided it. By my accounting, I was among the top 20 Democrats that Fox discussed, even months after I left government service. That’s a personal example, but the professional list is long and far-reaching. I’ve used the Archive to access content that has been taken down. Kat Tenbarge writes, “I’ve used the Wayback Machine to visit archived webpages from more than a decade ago, some of which helped corroborate serious allegations. I use the Wayback Machine regularly to figure out how many followers someone had at a given time, to resurface since-deleted material, and to find context for historical posts.” In my research, I regularly Archive public interest links myself in order to preserve access to them and create a historical record. I’ve been burned too many times by dead links in other people’s work, and that’s not their fault; Kahle points out that the life of the average webpage is only 100 days.
The Internet Archive and Wayback Machine are under serious threat of censorship by news organizations because of AI.










