Those of us who woke up bright and early on the East Coast today found ourselves with an internet with far fewer sites to browse than usual. Major news outlets like The New York Times and The Guardian were completely inaccessible beginning around 5:30 a.m. ET, as were many other popular sites like Reddit, Amazon, and Twitch. This wasn’t your typical one-site outage.
For an hour or so, the internet was much quieter than usual — except on Twitter, where everyone was, as you might expect, freaking out a bit about the sudden disappearance of large portions of the web. News outlets were forced to take extreme measures to report the news of their own sites being down; The Verge, for example, created a Google Doc to inform its readership of any updates.
We have (half) an answer as to why this happened, now: Fastly — a cloud computing services company you’ve probably never heard of — had some sort of service configuration issue that caused the widespread outage. Just one glitch and half the internet (that’s an exaggeration that feels like it isn’t) was taken out in one blow.
You’re telling me one company did this? — Yep, Fastly’s little fiasco really did impair all these big sites. That might seem like a lot of power for one company to hold; that’s because it is.
Fastly is a content delivery network (CDN), which basically means it runs some of the biggest servers around the world that keep the internet alive and moving at the breakneck pace we’re used to. One of a CDN’s most important functions is speeding up the web by caching webpages on servers that are physically closer to your location.
Fastly updated its status page at 5:58 a.m. ET with a simple message: “We’re currently investigating potential impact to performance with our CDN services.” The company tweeted just after 7 a.m. that it had identified a service configuration that “triggered disruptions” across its POPs (Points of Presence).
Big, big yikes — Fastly restored its services quickly enough, all told, and most affected sites were back online — albeit loading more slowly than usual — about an hour after the outage was first reported. At the time of this writing, Fastly hasn’t really communicated what, exactly caused this widespread issue. The company’s last status posting simply states that Fastly “has observed recovery of all services and has resolved this incident.”
On a business level, Fastly will need to provide much more transparency about this incident if it wants to continue doing business with some of the world’s largest websites. Whatever unhappy accident led to this outage needs to be communicated in order to restore trust in the company.
And then there are the broader implications of an incident like this one. We rely on the internet for just about everything these days — watching how one file can take down a huge number of those services is nothing short of terrifying.
Ensuring this doesn’t happen again in the future is on Fastly, for the most part. But it would also probably help if the power to support all these major sites didn’t fall on just one company — but that’s a longer conversation for another day, and the danger of an increasingly centralized internet.