Culture

Wayback Machine and Cloudflare team up to archive the Internet

Two of the most extensive cataloguers of the Web will now work in tandem to make sure your digital embarrassments live on forever.

Smith Collection/Gado/Archive Photos/Getty Images

Last week, the Internet Archive announced in a blog post its new partnership with the web-infrastructure and security company, Cloudflare, to further expand its already massive back-catalog of public digital spaces. Now, any website employing Cloudflare's Always Online service will be automatically archived via the Archive's Wayback Machine, ensuring even more ways for the Internet to haunt you in the years come.

Always Online, now always automatic — Always Online works by creating static images of clients' sites that can be used if their servers happen to go offline for any reason. As of last Friday, anytime a customer enters a URL into the Always Online service, that address will now instantly also be forwarded to the Wayback Machine for archiving. "The Internet Archive’s Wayback Machine has an impressive infrastructure that can archive the web at scale. By working together, we can take another step toward making the Internet more resilient by stopping server issues for our customers and in turn from interrupting businesses and users online," Cloudflare CEO, Matthew Prince, said in the Internet Archive's post.

Previously, the Wayback Machine "crawled" lists of millions of websites, cataloged user-submissions via its "Save Page Now" feature, as well as "added to Wikipedia articles, referenced in Tweets, and based on a number of other 'signals' and sources, such multiple feeds of 'news' stories." Those methods were pretty damn effective on their own: over the past twenty years, the Wayback Machine has archived over 468 billion webpages, with around 1 billion new URLs added every day. The addition of Cloudflare's sources is almost certain to dramatically increase this rate, including instances of URLs being entered for the first time ever (appropriately deemed a "First Archive" event by the Internet Archive).

Archival context is still key — Despite the grim knowledge that any dumb thing you post online now has an even better chance at becoming digitally immortal, the partnership between the two companies marks an important moment in public information sharing and access. Cloudflare is one of the world's leading digital infrastructure companies and is currently used by over 11 million websites, so its Always Online feature will quickly ramp up the number of pages archived within the Wayback Machine project.

Of course, bulk archival without context can be incredibly problematic, as recent COVID-19 conspiracy theories have shown us. And then there's that whole issue of intellectual property rights to remember...