Science

Why It Took Cloudflare a Week to Publicize Its Massive Data Leak

Change your passwords.

On Thursday night, Cloudflare publicized a major bug that existed for several months in content delivery network — how web pages are loaded for users based on their location — that allowed private information like passwords, emails, and private messages to be scraped and cached by search engines like Google.

Essentially, sensitive data from websites like Fitbit, OK Cupid, and Uber was leaked onto the public web and cached by search engines.

It’s being called Cloudbleed and basically, you should change your passwords right now, but here’s why it took a week for the public to find out about it, from Cloudflare CTO John Graham-Cumming:

Google (and other search engines) had cached some of the leaked memory through their normal crawling and caching processes. We wanted to ensure that this memory was scrubbed from search engine caches before the public disclosure of the problem so that third-parties would not be able to go hunting for sensitive information.

In a blog post publicizing the problem — “Incident report on memory leak caused by Cloudflare parser bug” — Graham-Cumming explains that the company learned of the problem on February 17 by Google, but wanted to erase as much of the private data from search engine cache first.

“Our natural inclination was to get news of the bug out as quickly as possible, but we felt we had a duty of care to ensure that search engine caches were scrubbed before a public announcement,” Graham-Cumming writes.

Tavis Ormandy of Google’s Project Zero discovered the problem, and according to his original comments, “[personally identifiable information] was actively being downloaded by crawlers and users during normal usage, they just didn’t understand what they were seeing.” The information being leaked just looked like a mess of characters to the average user, but it quickly became clear to Ormandy what was happening.

It took every ounce of strength not to call this issue “cloudbleed,” he writes in the post.

How did the Cloudflare Cloudbleed Leak Happen?

In Graham-Cumming’s blog post, he explains that this leak could be traced back to one single character in CloudFlare’s code. That’s right. One character. One line of the code included the characters == instead of >=, which allowed the C code parser called Ragel to write past the buffer. “The Ragel code we wrote contained a bug that caused the pointer to jump over the end of the buffer and past the ability of an equality check to spot the buffer overrun,” writes Graham-Cumming. This was a piece of code that Ragel automatically generated, and Graham-Cumming admits that they weren’t using Ragel correctly.

This code had been present and running in CloudFlare for years, writes Graham-Cumming, but a recent code change to cf-HTML exposed the bug and caused it to start scraping personal information.

Here’s the full, unofficial list of websites affected by cloudbleed.

How to Protect Yourself from Cloudbleed

Some 5.5 million websites use Cloudflare’s services to pull their websites, so while it would be unusual — 1 in every 3,300,000 page loads, the company reports — that your password-protected information is surfacing in Google search results, it not impossible. “The greatest period of impact was from February 13 and February 18,” the company announced.

Security researcher Ryan Lackey, aka octal, offers some reassurance though: “There is always the potential someone malicious discovered this vulnerability independently and before Tavis, and may have been actively exploiting it, but there is no evidence to support this theory.”

Lackey says site operators should require their users to change their passwords, but that comes with drawbacks like lost customer trust or account deletion.

For users, changing your password — just don’t make it one of these terrible passwords — is key.