The July 2024 Outage
On July 19th, 2024, a company that most people had sensibly never heard of managed to knock out a massive chunk of routine operations at institutions worldwide. I've been tracking infrastructure risk in financial markets for a while, and what jumped out at me about this one is how disproportionate it was - a single firm, one bad update, and suddenly the banking sector is scrambling. Several of the largest U.S. banks publicly confirmed they were hit. One major institution reportedly sent its tellers and bankers home for the day, just idled the entire branch network. And it wasn't just the big names. Large regionals, community banks, institutions across the whole size spectrum got caught up in this.
So how was this even possible? Understanding it requires pulling apart the intersection of banking regulation, endpoint security architecture, and some genuinely unfortunate software design decisions.
There's also something worth paying attention to in how financial systems reconstitute themselves from less formal credit sources when the primary plumbing breaks. That part surprised me, frankly, and it offers real lessons about systemic resilience that don't show up in the usual post-mortems.
Technical Background: Kernelspace vs. Userspace
Most operating systems draw a hard line between the "kernel" - that is the core software supplied by the OS manufacturer - and everything else running on the machine. That everything-else zone is called "userspace," and it's where nearly all programs live.
Here is why this matters. Programs in userspace are relatively boxed in. They can do their thing, but they can't reach down and touch the hardware directly. Kernelspace programs? Totally different story. They get direct access to hardware under the operating system. And when something goes wrong in kernel code, it doesn't just crash one application. It takes the whole machine down with it. That distinction is the entire ballgame for understanding what happened next.
What is Endpoint Monitoring?
CrowdStrike Falcon is endpoint monitoring software. If you haven't dealt with this category before, think of it this way: large enterprises run tens or hundreds of thousands of devices - laptops, servers, workstations - and those devices are basically illegible to the organization that owns them. No single person, no team of people, truly understands what is happening on all of them at any given moment. The gap between "we own these machines" and "we know what's going on inside them" is enormous, and it fluctuates constantly.
Endpoint monitoring promises to close that gap. It gives security teams visibility again, with continuously updated threat intelligence from the provider. Economies of scale, centralized dashboards, the whole pitch.
What kinds of things can go wrong on an endpoint? Physical theft, for one. Or an employee downloads some unauthorized software and the machine quietly joins a botnet being run out of - well, wherever the adversary happens to be sitting.
In theory, organizations monitor all their computers on an ongoing basis. Security teams respond to alerts generated by the endpoint solution. Some alerts merit a deeper look. Some require immediate action. The conversations range from "hey, who installed this?" all the way to serious incident response - novel viruses hitting multiple machines, subnet isolation, forensics to figure out whether data has been exfiltrated. That full spectrum.
The Configuration Bug
Here's where it gets painful. Falcon didn't ship a software bug in the traditional sense. CrowdStrike pushed out a configuration update - not new code, just a bit of data meant to update the conditions Falcon scans for. In modern development practice, new software (hopefully) goes through extensive testing and staged release procedures. But this was just data. Or it was supposed to be.
Due to an error at CrowdStrike, that data caused existing, already-reviewed Falcon code to fail catastrophically. And because the failure happened in kernelspace at a particularly vulnerable moment during the boot sequence, Windows systems experienced total failure. The user-visible symptom? Blue Screen of Death. Across 8.5 million machines.
Configuration Bugs in Context
Configuration bugs cause a disturbingly large share of engineering outages. I've seen the data on this, and it is not a small percentage. But this particular config bug was special (and not in a good way) because it hit widely distributed software running in kernelspace, deployed almost universally across the workforce machines of institutions that society literally depends on. Banks, airlines, hospitals. The blast radius was unlike anything in recent memory.
"Blast radius" - how far afield from the broken system will users feel the pain. The Falcon misconfiguration ranks among bugs with the broadest direct blast radius in recent history. That's not a typo or exaggeration.
And here's the cruel irony: fixing it was complicated by the fact that many of the people who needed to fix it couldn't access their work systems. Because those systems had Blue Screen of Death'd. You can't remotely patch a machine that won't boot.
Why Was Coverage Universal?
The vulnerable software sat on essentially every machine in affected institutions. All of them. But why would anyone install one piece of security software on literally every device? Because that is the entire point of endpoint monitoring. Someone's job - their actual, day-to-day responsibility - is hunting down devices that aren't being monitored and bringing them into compliance.
Why optimize that aggressively for coverage? Partly for genuinely good security reasons. But a major driver (and this is the part that gets uncomfortable) is that small-c compliance is necessary for large-C Compliance. Regulators effectively demand it.
Why Kernelspace?
Falcon runs in kernelspace instead of userspace for a fairly straightforward reason: the most direct way to monitor what other programs are doing is to simply ignore the security guarantees that operating systems give programs in userspace. Looking at another program's memory is generally considered somewhere between rude and actively prevented by serious engineering safeguards. But endpoint monitoring software treats other programs on the device as potentially hostile - running at the adversary's direction. Their comfort level with being inspected is, to put it mildly, a distant secondary consideration.
The Microsoft-EU Situation
There is another reason Falcon ended up in kernelspace, and it is weirdly political. Microsoft was prevented by an understanding with the European Commission from firmly demoting third-party security software down to userspace. The logic: Microsoft both writes security software and controls Windows, so it always has the option of running its own tools in kernelspace while forcing competitors into the less privileged sandbox. The European Commission has pushed back against this characterization, noting regulatory complexity around the issue. But the practical outcome? Third-party security vendors like CrowdStrike got to keep their kernelspace access. Which, on July 19th, turned out to matter quite a lot.
Banking Regulation and Software Purchases
Now, it would be an overstatement to say the U.S. federal government commanded financial institutions to install CrowdStrike Falcon and thereby embed a vulnerability into the kernels of every employee's computer. That is not how banking regulation works.
Life is more subtle.
The Regulatory Framework
The United States has a frankly bewildering number of banking regulators. These regulators have desires that rhyme heavily with each other, so they've banded into clubs to share resources. Makes sense - why duplicate effort on the basics when you could spend limited budgets on things where regulators actually have individualized opinions?
One such club is the Federal Financial Institutions Examination Council (FFIEC). They wrote the FFIEC Information Technology Examination Handbook's Information Security Booklet. It is, as you might imagine, riveting reading.
The typical consumer of this document is probably not a Linux kernel programmer who dreams in C and has an instinctive feel for the kernelspace-userspace boundary. That's an unreasonable expectation for a banking supervisor. They work for a regulator, not a software company, doing important supervisory work rather than systems engineering.
The Risk Analysis Process
The ITEH isn't super prescriptive about exactly what controls an institution must have. This is common in regulation - set outcomes, not methods. So to facilitate conversations with examiners, institutions conduct risk analyses. Or more accurately, they pay consulting firms to conduct risk analyses. And in the production function that is scaled consultancies, this means junior employees open template documents and add important client-specific context like, well, names and logos.
Those documents reference the ITEH heavily. They exist to guide conversations with examiners toward areas of maximum mutual interest. Nobody is trying to surprise anyone.
Consultants, when conducting these mandatory analyses, produce shopping lists. Endpoint monitoring is one item on those lists. Why? Ask consultants and they'll bill you for the answer. But the likely driver is Section II.C.12 Malware Mitigation.
Not Hugely Prescriptive, But...
Does the FFIEC have a hugely prescriptive view of what institutions should do for malware monitoring? Well, no:
"Management should implement defense-in-depth to protect, detect, and respond to malware. The institution can use many tools to block malware before it enters the environment and to detect it and respond if it is not blocked. Methods or systems that management should consider include..." followed by 12 bullet points varying in specificity from whitelisting allowed programs to port monitoring to user education.
Twelve bullet points. Broad enough to drive a truck through. But here is what happens in practice. Consultants advise that you want a very responsive answer to II.C.12 in your reports, and since your institution probably does not have Google's ability to fill entire floors with people doing industry-leading security research, you should just buy something that says "Yeah We Do That."
CrowdStrike's Market Position
CrowdStrike's sales reps will happily tell you they do that. Their web presence is the output of deterministic processes co-owned by Marketing and Sales departments at B2B software companies - the kind that produce industry-specific "sales enablement" collateral. They will even send you documents that align closely with risk assessment requirements, specifying which exact objectives and controls purchasing their product solves for. It's a well-oiled machine.
CrowdStrike wasn't strictly the only vendor that could have been installed on every computer to make regulators happy. But due to the vagaries of how enterprise software sales teams operate, they secured significant market share in government-adjacent industries. Partly because they aggressively pursued writing the kind of documents you need when the people reading project plans hold national security briefs. That's a real competitive moat, and it turns out it's also a systemic risk factor nobody was really accounting for.
Money as Critical Infrastructure
Money is core societal infrastructure. Same category as the power grid, same category as transportation. And it would be extremely damaging if hackers working for a foreign government could simply turn money off. Think about that for a second. More damaging than a conventional missile being fired at random into a major city - and the range of available responses might actually be more constrained.
And so the situation arose where what amounts to an advanced persistent threat was effectively invited into kernelspace. On purpose. Across an entire industry.
Security Tools as Vulnerabilities
Security professionals understand something that sounds paradoxical to everyone else: security tools themselves introduce security vulnerabilities. Part of the worry is monocultures - if everyone runs the same software, a weakness in that software becomes a weakness in everything. Part of it is that security tools (and security personnel) frequently carry more privileges than anything else on the network, which makes them high-value targets for adversaries. This observation is fractal in systems engineering: at every level of abstraction, if the control plane gets compromised, the battle is lost.
CrowdStrike maintains they don't believe a bad actor intentionally tried to bring down global financial infrastructure by weaponizing their product. No, CrowdStrike did that themselves. Accidentally. Of their own volition. But this demonstrates the problem with uncomfortable clarity: if a junior employee tripping over a metaphorical power cord at one company can bring down computers worldwide, adversaries have a whole menu of options for achieving directionally similar aims by attacking directionally similar power cords. And they are thinking about it.
When Money Stops Working
Reports of the outage spread through social media first, the way these things always do now. Bank branches cited "the Microsoft systems issue" when customers showed up trying to withdraw cash from teller windows. That framing tells you something - even the bankers didn't fully understand what was happening to them.
Cash-Dependent Populations
A lot of economic activity still runs on cash. For complex social and economic reasons (some of which nobody wants to examine too closely), engaging with various contractors and service providers often requires frequent, sizable cash payments.
This created genuine emergencies. Many contractors are small businesses. Many small businesses are thinly capitalized. And many employees of those businesses are extremely dependent on receiving compensation exactly on payday - not after, not "soon," on the day. While plenty of people were basically unaffected because their money kept working through mobile apps, through Venmo and Cash App, through credit cards, the cash-dependent population got enormous wrenches thrown into their plans. For some of them this meant missing rent, or not being able to buy groceries that weekend.
Infrastructure Failure Impacts
Reports indicated that attempting to withdraw cash at three financial institutions in different weight classes proved absolutely impossible at all of them. Every single one, down because of Falcon.
At one institution, tellers were unavailable but ATMs still worked. Small victory - except many customers tried to pull out more cash than they ever had before (makes sense, if you can't trust the system you grab what you can). And then the fraud detection systems kicked in. Normally, no big deal: systems flag potentially fraudulent behavior, customers unflag themselves by responding to instant communications from the bank. Quick verification, move on. Except - and I wish I were making this up - the subdomain that communication directed them to ran on servers apparently protected by CrowdStrike Falcon. So the system designed to verify you're not a criminal couldn't verify anything, because it was also bricked.
Not every institution went dark. Some banks around various cities actually ran out of physical cash at certain branches, because all Friday demand for cash was being serviced by the handful of institutions still operational instead of being spread across the whole system.
Shadow Information Networks
What always happens during widespread infrastructure failures happened here too. Shadow economies of information trading popped up almost instantly, redirecting relatively sophisticated people to places that could still service them. This happens through offline social networks (the way it has since time immemorial) and online social networks (since those were invented). The offline version is probably more impactful for most people, but the online version is more legible and measurable. So naturally, banking regulators tend to focus disproportionately on the technology aspects of these phenomena. You can see where that reflex might not always serve us well.
Historical Precedent: Ireland 1970
There's a historical parallel that I keep coming back to when I think about what happened in July.
In 1970, the Irish banking sector went through a widespread and sustained strike. Six months. Workers couldn't cash paychecks because the tellers refused to work. So pub operators stepped in, cashing checks from the till, trusting that eventually checks drawn on accounts of local employers would be good funds again. Just trusting. Based on knowing the person handing them a check.
Some publicans even cashed personal checks, backed by what you might call the swift and terrible justice of the credit reporting bureau "We Control Whether You Can Ever Enjoy A Pint With Your Friends Again." It kept physical notes circulating in the economy. Remarkable, when you think about it.
Alternative Financial Networks
During the CrowdStrike outage, similar informal networks surfaced. Churches, much like bars (a sentence I did not expect to write today), have most of their weekly income come through electronic payments but still do substantial cash management through the workweek heading into the weekend. When organizations needed to work around broken financial infrastructure to get people their wages, institutions with established trust relationships and moral imperatives around fair payment turned out to be valuable. Communities are more resourceful than their infrastructure assumes.
Financial infrastructure normally functions to abstract away personal ties - replacing favor-swapping with legibly-priced, broadly-offered services. But when that infrastructure breaks, the old networks based on reputation and mutual obligation can provide backup systems. They're slower, they're messier, they don't scale the same way. But they work.
Thankfully, while this outage was surprisingly deep and broad, banks were mostly back to normal by Monday.
Conclusion
What the CrowdStrike incident really reveals is a set of troubling dynamics sitting at the intersection of banking regulation, enterprise software sales, and critical infrastructure protection. Well-intentioned regulatory frameworks built to protect financial systems inadvertently created monocultures that are vulnerable to single points of failure. Nobody planned this. It emerged from a thousand rational individual decisions that added up to something deeply irrational at the system level.
Regulatory compliance processes can drive adoption of specific technology solutions across entire industries. When regulators effectively require endpoint monitoring and a vendor provides a turnkey solution that checks the compliance boxes, rational actors adopt it. Broadly. And that creates systemic risk: what helps an individual institution satisfy its regulator may simultaneously increase the entire sector's vulnerability. The thing designed to make the system safer made the system fragile in a new way.
Security tools themselves introduce security vulnerabilities. Especially when they operate in privileged contexts like kernelspace. Especially when they enjoy near-universal adoption. The very characteristics that make endpoint monitoring effective - broad deployment, deep system access, continuous operation - are exactly what makes a configuration error catastrophic in scope. The features are the bugs.
And then there's the resilience of informal financial networks during the outage, which offers a genuinely interesting contrast. Modern infrastructure normally abstracts away personal relationships, replacing them with scalable formal systems. But those informal networks - pubs in 1970s Ireland, churches in 2024 America - proved they could reconstitute basic financial functions through reputation-based trust when the formal systems went dark. These backup systems are worth understanding, not as curiosities, but as genuine infrastructure.
Regulatory frameworks going forward should be thinking about sector-wide resilience, not just whether any individual institution has its boxes checked. Avoiding monocultures, maintaining heterogeneous security solutions, preserving backup systems (even the informal, unglamorous ones) - all of this contributes to systemic robustness. The goal should be compliance requirements that actually improve security without inadvertently creating new single points of failure that can take down entire industries in a single morning.
Key Takeaways
- A single configuration bug in CrowdStrike Falcon running in kernelspace brought down banking infrastructure worldwide
- Banking regulations effectively drove widespread adoption of endpoint monitoring solutions like Falcon
- Security tools operating in privileged contexts introduce their own security vulnerabilities
- Monocultures in security solutions create systemic risk - what helps individual firms may harm the sector
- Informal financial networks based on reputation and trust provided backup systems during the outage
- Regulatory frameworks should consider sector-wide resilience, not just individual institution security
Research, Bellwether Research, December 7, 2024