CrowdStrike's Epic Fail

CrowdStrike BSOD at LGA.jpg
By Smishra1 - Own work, CC BY-SA 4.0, Link

Evening folks!

Woke up Friday to find that CrowdStrike, the cybersecurity giant tasked with keeping us safe, ironically became the source of a tech nightmare. They pushed a faulty update that sent nearly 9 million Windows systems into a boot loop. The very entity hired to protect our systems ended up causing chaos.

Imagine this: Your PC starts, hits the blue screen of death, and then loops back to try again – rinse and repeat. No simple fix unless you know the intricate steps to reboot or repair it manually. This incident was initially branded as the “Microsoft global IT outage,” but this time, Microsoft wasn’t to blame.

From my perspective, this is the biggest IT and cyber mishap we’ve witnessed, even surpassing notorious incidents like NotPetya or WannaCry. It’s like your fire extinguisher starting a fire.

CrowdStrike isn’t the first to mess up, and they won’t be the last. But crashing so many systems so quickly? That’s unprecedented for a security vendor.

To their credit, they pulled the faulty update in under ninety minutes – but the damage was already done. They switched to damage control mode and, to be fair, managed it quite well. Let’s hope they don’t blame some poor developer for this fiasco.

This incident highlights serious issues in the cybersecurity sector.

The media’s response was disappointing. They initially reported it as a Microsoft global IT outage. My first thought? Ransomware. After some digging, it became clear there were two separate incidents:

  1. Microsoft Azure had an outage earlier in the day, which was resolved quickly.
  2. CrowdStrike’s update fiasco took out a significant chunk of their customers’ systems.

The media conflated these two events, creating a confused narrative. They weren’t connected.

CrowdStrike BSOD at LGA.jpg
By alliance / Anadolu | Selcuk Acar

Eventually, it became clear that the major impact was due to the CrowdStrike update, not Microsoft. Some outlets corrected their reports, but many continued to mislabel it as a Microsoft issue.

Microsoft didn’t help matters. It took them hours to clarify that it was a third-party issue. They waited too long to name CrowdStrike. Here’s a tip for Microsoft: If you’re getting blamed for something that isn’t your fault, quickly clarify, even if it implicates a partner.

But the real culprits here are the media. They lacked the knowledge or sources to report accurately on this. I found myself pointing people to the BBC, who were actually trying to get to the bottom of it.

Zero Trust in Cybersecurity Vendors?

This might be controversial, but we put too much trust in cybersecurity experts, and there’s a glaring lack of transparency and accountability. The threat of ransomware has made the security sector indispensable, giving them near-universal admin access to PCs worldwide.

Organizations have rushed to install Endpoint Detection and Response (EDR) agents – enhanced antivirus software – everywhere. These agents auto-update, often in uncontrolled ways, to stay ahead of threats. While this has generally been beneficial, it also introduces risks.

The security industry is too cozy with governments, and regulatory standards often mandate these EDR agents. Most of these standards are influenced by a few cybersecurity vendors whispering in lawmakers’ ears.

The problem? Transparency and control. Almost every major EDR vendor has kernel access in Windows – essentially ‘god mode.’ They obscure their software to prevent analysis, partly to deter criminals but also to avoid scrutiny. Updates are pushed constantly, often multiple times a day, with zero visibility or accountability.

Some vendors, including CrowdStrike, publish updates allowing them to run detection code from the kernel unsafely, which can trigger blue screens. The ‘R’ in EDR stands for Response, meaning they can react to cyber incidents in real-time. Most EDR vendors have moved to cloud solutions, giving them full control over the setup – and by extension, so does any group that gains access to these vendors.

Essentially, we’re handing the keys to our digital kingdoms to a few private companies with no external oversight. This always felt dodgy; now, it feels outright dangerous.

Boise Airport 2024 CrowdStrike Issue.jpeg
By Smishra1 - Own work, CC BY-SA 4.0, Link

So, what should customers demand from their vendors?

  1. Transparency in Endpoint Security Updates: Clear information on how updates are tested and a robust rollback plan.
  2. Risk Disclosure with Windows Kernel: Vendors must disclose any risky interactions with the Windows kernel and aim for safer drivers.
  3. Security Transparency: Full transparency regarding their own security incidents.
  4. National Security Law Disclosure: Vendors must disclose if they are subject to national security laws that could grant access to customer data.
  5. Incident Reporting: Promise a transparent report for any bad updates.

These steps would help customers make informed decisions about their risk levels, and vendors should be committed to them.

Judging EDR Vendor Performance

People often judge EDR vendors based on detection performance and CPU impact. I think we need a category for stability, which should be part of independent testing.

Take the CrowdStrike issue: It can still be replicated by anyone placing a broken .sys file into the CrowdStrike system folder. The system only checks the first few bytes of the file. If the channel file is invalid, the machine blue screens and fails to boot. This is arguably a security vulnerability and should have been identified through independent testing. However, it wasn’t, because no one is scrutinizing this aspect, and vendors actively prevent this level of testing from being public.

Transparency and Trust in Vendors

I want to be clear that I’m not attacking all vendors, not even CrowdStrike. The point is that, as a customer, I lack the visibility and data to trust vendors regarding resilience. This is a significant problem.

I know from personal experience that some vendors are better than others in this area, but that knowledge isn’t scalable beyond my own experience.

Customer Rights and Responsibilities

Customers have both the right and responsibility to push cybersecurity vendors who make endpoint products to be more transparent and responsible. Cybersecurity vendors are in a unique position of access and have a responsibility they are not yet fully meeting. Endpoint protection products should not be judged solely on detection percentages. Businesses need availability – often more than security. This need for availability has been overshadowed over time.

I already know board members who are pointing out that organizations purchased CrowdStrike at great expense to avoid disruptive cyber events – only to now face the significant cost of a major recovery effort to get their businesses back online.

This is a widespread issue within the cybersecurity industry. The CrowdStrike incident has set everyone back. Which other businesses are one bad cyber update away from losing control of their operations? It’s unclear. How is that acceptable?

History shows that the cyber and IT industries have short memories when it comes to these incidents. In my career, I’ve seen my organizations severely impacted twice by different security vendors pushing bad updates. Those were extremely challenging days. Hopefully, this incident will prompt customers to push back and demand better standards.

I also hope vendors will take the lead in addressing these issues. Both CrowdStrike and Microsoft have an opportunity to jointly develop measures to prevent this kind of incident from happening again.

Microsoft has previously tried to address issues with kernel drivers and system stability but faced regulatory and competition concerns. These discussions need to be revisited, as a small number of cybersecurity companies now effectively operate in “God Mode” over the world’s economy. The risks have increased over time, and there needs to be a way to enforce less risky behavior across all vendors, including Microsoft’s security solutions.

The cybersecurity industry is still in its early stages compared to its potential journey, and currently, we’re just a bunch of tech enthusiasts with too much power.

People Love Conspiracy Theories More Than the Boring Truth

In recent discussions about this incident, I’ve encountered tweets with millions of views – far more than the actual factual information – making absurd claims such as:

  • CrowdStrike retaliated against Microsoft for laying off some of their Diversity, Equity, and Inclusion staff.
  • CrowdStrike covered up a Donald Trump assassination attempt by wiping their customers’ PCs.
  • CrowdStrike, being a Ukrainian company, sought revenge over the Clintons by disrupting their customers.

All of this is complete and utter nonsense.

This misinformation has even spread to LinkedIn, where I’ve seen cybersecurity professionals – profiles visible with employer names – copying and pasting this nonsense. It’s astonishingly stupid.

What really happened is quite simple: CrowdStrike, which has incredible access to millions of PCs, made a mistake – likely during testing.

This mistake is not the fault of a single analyst. It highlights a series of failings within CrowdStrike, for which senior management should take responsibility. It also points to some structural problems in the cybersecurity industry.

In Summary

We, as customers, should demand more transparency from our cybersecurity vendors who sell us endpoint tools.

I’ve seen numerous LinkedIn posts from individuals at cybersecurity vendors (not CrowdStrike) claiming it is the customer’s fault for not having disaster recovery plans in place. This is obviously nonsense and doesn’t hold up due to how these tools are integrated.

This situation also underscores a broader issue. Some individuals in the cybersecurity vendor industry seem to view their customers as incompetent. They see themselves as the true wizards with all the threat intelligence, believing that EDR performance is solely about detection, and assuming that we will continue to buy and accept whatever they offer.

Maybe we aren’t incompetent, and maybe we won’t keep buying.



Tags: | Words: 1614