July 3, 2019 | 11:09
Load-management and security firm Cloudflare has apologised for an outage that took down a surprisingly large chunk of the web for around half an hour yesterday, placing the blame on a botched firewall update.
Founded in 2009 by Matthew Prince, Lee Holloway, and Michelle Zatlyn, Cloudflare offers services which aim to improve the performance and security of a variety of web services: It offers a high-performance content delivery network, load-balanced caching system, protection against distributed denial of service (DDoS) attacks, partial encryption for sites that would otherwise be unable to support it, and a web application firewall (WAF) designed to detect attacks and block them.
Sadly, this last feature proved troublesome yesterday afternoon when all of Cloudflares customers' websites - which includes some of the biggest sites on the web - began displaying HTTP 502 Bad Gateway errors. The outage, which lasted around half an hour, was similar in nature to the result of a BGP misroute last month, though more widespread - and it was entirely down to Cloudflare's own firewall system.
'We experienced a global service disruption that affected most Cloudflare traffic for 27 minutes,' the company explains in an email to customers. 'The issue was triggered by a bug in a software deploy [sic] of the Cloudflare Web Application Firewall (WAF) which resulted in a CPU usage spike globally, and 502 errors for our customers. To restore global traffic we temporarily disabled certain WAF capabilities, removed the underlying software bug, then verified and re-enabled all WAF services.
'We're deeply sorry about how this disruption has impacted your services. Our engineering teams continue to investigate this issue and we will be sharing detailed incident report(s) on the Cloudflare blog.'
Cloudflare has confirmed that it plans to improve its software testing and deployment processes in the wake of the outage.
July 19 2019 | 17:30