Major Microsoft Azure outage was caused by a simple typo


A Microsoft Azure DevOps outage in the South Brazil Region, which lasted over 10 hours, was caused thanks to a typo in the code that saw 17 production databases deleted.
Having apologized to impacted customers for the outage, Microsoft has now issued a full post-mortem, sharing details about the investigation that took place from when the outage was first noticed at 12:10 UTC on May 24, until its remedy at 22:31 UTC on the same day.
Microsoft principal software engineering manager Eric Mattingly shared details of the code base upgrade which formed part of Sprint 222. Inside the pull request was a hidden typo bug in the snapshot deletion job, which ended up deleting the Azure SQL Server rather than the individual Azure SQL Database.
Coding error
Mattingly explained: “when the job deleted the Azure SQL Server, it also deleted all seventeen production databases for the scale unit,” confirming that no data had been lost during the accidental process.
The outage was detected within 20 minutes, at which point the company’s on-call engineers got to work, however according to the event log the root cause was identified at 16:04, almost four hours after the outage had begun.
Microsoft blamed the over ten-hour fix time on the fact that customers themselves are unable to restore Azure SQL Servers, as well as backup redundancy complications and a “complex set of issues with [its] web servers.”
Having learned from its mistake, Microsoft has no promised to roll out Azure Resource Manager Locks to its key resources, in an effort to prevent future accidental deletion.
Despite a same-day fix, customers in the region were left without access to some services for several hours, emphasizing how easy it is for things to go wrong and the importance of having backup plans to reduce reliance on single service providers, including cloud storage and other off-prem infrastructure.
A Microsoft Azure DevOps outage in the South Brazil Region, which lasted over 10 hours, was caused thanks to a typo in the code that saw 17 production databases deleted. Having apologized to impacted customers for the outage, Microsoft has now issued a full post-mortem, sharing details about the investigation…
Recent Posts
- GoPro unveils a much cheaper 360-degree camera, but it’s not the all-new Max 2 that we’ve been waiting for
- Among Us 3D will let you deduce from a first-person perspective
- Rumor suggests Nvidia’s had difficulties to iron out with chips for RTX 5070 and 5060 GPUs, seemingly leading to delays and possibly low stock levels
- Apple’s Murderbot series starts streaming in May
- Amazon MGM Studios acquires the license to thrill as its gains full creative control of the entire James Bond franchise in landmark deal
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010