We know what caused the recent massive Google Cloud outage – and it’s a bit embarassing
- Google Cloud’s API service ere to blame for widespread outage
- Most regions were back online in 40 minutes, but some took even longer
- The company has promised to protect against future outages and improve communication
Following Google Cloud’s recent widespread outage, which took sites like Spotify, Cloudflare and Discord offline, the company released its detailed report sharing exactly why it failed customers.
The company says the root cause was a code issue in Service Control – part of the company’s API management and policy checking system.
Specifically, invalid automated quota update and a lack of proper error handling triggered a global crash loop, with 503 errors seen across not only Google Cloud services, but services using its APIs.
Google Cloud outage caused by API issue
The outage affected the Google Cloud infrastructure, as well as other popular Google Workspace apps like Drive, Docs, Gmail and Calendar. However, third-party sites accessing Google Cloud’s API, including popular music streaming platform Spotify which boasts of 678 users, as well as some Cloudflare services, were also affected.
“On May 29, 2025, a new feature was added to Service Control for additional quota policy checks,” the company wrote in its incident report. “The issue with this change was that it did not have appropriate error handling nor was it feature flag protected.”
Google Cloud boasted that its Site Reliability Engineering team had started triaging the incident within two minutes, having identified the root cause within 10 minutes. “The red-button [to disable the serving path] was ready to roll out ~25 minutes from the start of the incident,” Google said, with the rollout complete within 40 minutes.
Although smaller regions recovered relatively quickly, larger regions like us-central-1 took longer to come back online – around two hours and 40 minutes in the case of this particular region.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
In its mini incident report issues on the day of the outage, Google Cloud promised to “do better.” Its more detailed report promises the usual responses going forward, such as improving static analysis and testing practices, auditing and modularizing Service Control’s architecture to contain future incidents, but the company has also pledged to “improve [its] external communications” to better inform customers, ensuring that its communications infrastructure remains online even during such outages in the future.
You might also like
Google Cloud’s API service ere to blame for widespread outage Most regions were back online in 40 minutes, but some took even longer The company has promised to protect against future outages and improve communication Following Google Cloud’s recent widespread outage, which took sites like Spotify, Cloudflare and Discord offline,…
Recent Posts
- Amazon develops a warehouse robot workers can speak to
- This App Makes Google TV Actually Usable
- Google Wallet ID passes will be available in select EU states this summer
- Shokz upgraded its open earbuds with better sound and a lighter design
- Shokz says its clip-on OpenDots 2 earbuds focus on improved volume and bass
Archives
- June 2026
- May 2026
- April 2026
- March 2026
- February 2026
- January 2026
- December 2025
- November 2025
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023