Security

Patches

CrowdStrike meets Murphy's Law: Anything that can go wrong will

And boy, did last Friday's Windows fiasco ever prove that yet again


Opinion CrowdStrike's recent Windows debacle will surely earn a prominent place in the annals of epic tech failures. On July 19, the cybersecurity giant accomplished what legions of hackers could only dream of – bringing millions of Windows systems worldwide to their knees with a single botched update.

As a veteran tech journalist, I've seen my fair share of software snafus. Heck, I went hand-to-hand with the grandpa of all network blow-ups – the Morris Worm – in 1988 when I was a sysadmin. Even so, I can't help but marvel at the sheer scale and impact of this blunder. CrowdStrike, a company valued at over $70 billion and trusted by countless organizations to protect their digital assets, inadvertently became the source of one of the largest IT outages in history.

The fallout from this debacle was staggering – thousands of flights canceled, healthcare services disrupted, and 911 systems knocked offline. It's a stark reminder of how deeply intertwined our digital infrastructure has become and how vulnerable it can be to a single point of failure.

Let's break down the cascade of errors that led to this fiasco.

In the beginning, Microsoft enabled CrowdStrike's Falcon security software to run at the zero level of the Windows kernel. Any problem at this low level will likely cause a Blue Screen of Death (BSOD). Meanwhile, Microsoft reportedly wants to blame the European Commission – no, really – for requiring it to grant third-party software vendors this level of access.

You know, I think with all of Microsoft developers and lawyers, they could come up with a better, legal way to avoid this kind of foul-up and let software companies compete equally. It's not rocket science. 

Microsoft doesn't want any of the blame, but it deserves some of it. For far too long, we've placed too many vital IT eggs in the Windows basket. When that basket falls, so does much of the economy.

Returning to CrowdStrike, the company claims a "logic error" in a routine sensor configuration update caused the meltdown. But for a company of CrowdStrike's caliber, such a fundamental mistake is inexcusable. This wasn't some obscure edge case – it was a critical failure in its core functionality.

It wasn't even a code problem. This wasn't a software update per se. The villain of this piece was a Falcon configuration file called a channel file. One simple file containing what should have contained data to update a security setting ended up causing a cascade of one BSOD after another.

How did such a catastrophic bug pass quality assurance? CrowdStrike admitted: "Due to a bug in the Content Validator, one of the two Template Instances passed validation despite containing problematic content data [and] were deployed into production." When your software has deep hooks into millions of Windows systems, your testing should be bulletproof. Clearly, CrowdStrike's testing protocols need a massive overhaul.

We also now know, as security expert Kevin Beaumont pointed out on Mastodon: "The key takeaway – channel updates are currently deployed globally, instantly." I always send major patches to all my customers simultaneously and wait to see what happens next. Doesn't everyone? Who are these people, and why does anyone let them do security work?

There's a simple concept called canary testing. You may have heard of it. Like the proverbial canary in a coal mine, you first test whether a new space – or program – is safe by trying it on a canary – or a small group of users – and then, if all's well, let everyone else in.

Let's not forget that CrowdStrike's initial response was slow and inadequate. Users were left scrambling for answers while critical infrastructure faltered. Even today, almost a week later, I still have friends having trouble with their Delta flights.

This serves as a sobering wake-up call for the rest of us in the tech industry. As we rush to secure our systems against external threats, we must not overlook the potential for self-inflicted wounds. Rigorous testing, fail-safe mechanisms, and a healthy dose of humility are essential when dealing with critical systems.

In the end, CrowdStrike's Windows fiasco is a textbook example of Murphy's Law in action – anything that can go wrong will go wrong. It's a painful lesson but one that we would all do well to learn from. After all, in cybersecurity, your next big threat might just be an update away. ®

Send us news
98 Comments

Windows 7 finally checks out as POSReady 7 closes the till on an era

Embedded versions live longer – including Windows 10 LTSC

Delta officially launches lawyers at $500M CrowdStrike problem

Legal action comes months after alleging negligence by Falcon vendor

Want to feel old? Excel just entered its 40th year

More senior than Windows itself, and still runs the world

Microsoft teases latest Windows 10 build despite looming end

Rearranging the deckchairs as support iceberg approaches

After 3 years, Windows 11 has more than half Windows 10's market share

Microsoft's latest OS is performing dismally compared to predecessors

One-year countdown to 'biggest Ctrl-Alt-Delete in history' as Windows 10 approaches end of support

Microsoft's hardware compatibility gamble still hasn't paid off

Windows 11 24H2 disk space hoarding a 'reporting error'

Microsoft adds another item to the known issues list

Qualcomm 'pausing' X-Elite Dev Kit, offering refunds

Five months in, only 200 units reached customers, Qualy tells El Reg

Windows 11 24H2 hoards 8.63 GB of junk you can't delete

When the 'cleanup' option stubbornly refuses

Windows 11 migration? Upgrade engine revs up, enterprises have no choice

Support expiry clock ticks for Windows 10 – PC makers won't be happy with latest stats

Post-CrowdStrike catastrophe, Microsoft figures moving antivirus out of Windows kernel mode is a good idea

Existing low-level access for security solutions will undergo a rework

CrowdStrike's Blue Screen blunder: Could eBPF have saved the day?

Grafana Labs CTO looks at the options