Four lessons Microsoft customers can learn from the CrowdStrike meltdown

Mary Jo Foley is the Editor in Chief at Directions on Microsoft. Before joining Directions, Mary Jo has worked as... more

It wasn’t a massive global cyberattack, but it definitely felt like one. Late last week, a faulty update from security vendor CrowdStrike hit an estimated 8.5 million Windows devices worldwide, according to Microsoft estimates. Airlines, hospitals, banks, trains, broadcast stations, retail shops, and more got stuck in a blue-screen reboot loop, the results of which likely won’t be fully cleaned up for weeks or even months.

Whether or not you’re part of the clean-up crew, there are some important lessons to be learned from the CrowdStrike meltdown.

“Customers should take steps to minimize risk of endpoint failure due to potential bad updates or other incidents,” said Directions on Microsoft analyst Jim Gaynor. “However, failure is inevitable, so customers should also optimize their endpoint recovery processes to minimize impact of failure. And the resources spent on each should be based on rigorous risk assessment, and not reactionary decisions made out of fear.”

Four ways enterprise customers can take action

The Directions on Microsoft analyst team put our heads together to come up with some actionable items based on the CrowdStrike incident. Here are our top four:

1. Stage updates. Microsoft isn’t the only company that pushes out bad patches and updates; its partners do, too. That’s why it’s key to use deployment rings, applying updates to a select few devices first, to check whether it’s all systems go before rolling them out everywhere.

If you do stage updates, “don’t set auto-update and forget,” cautioned Directions analyst Michael Cherry. “I was a CrowdStrike customer and the last time I worked with their product they used to release their updates to Falcon in a preview mode, but I never bothered to look at them, like most people, because CrowdStrike had never had a problem. Until they do. You also need to know how to pause updates when you see things going bad.”

2. Make sure your systems hygiene processes are in order, so when (not if) another similar outage happens, you’ll be ready to take action.

“Ensure you have easy access to BitLocker keys. Confirm recovery partitions. Establish backups and, in the case of VMs, have snapshots for fallback. Test your imaging/deployment processes and ensure they’re current, documented, and streamlined. For distributed companies, have dedicated regional points of contact. The list goes on,” said Directions’ Gaynor.

BitLocker is especially key here. “Companies need to treat their BitLocker recovery keys like a part of their recovery plan, and if they can’t roll out or repair systems en masse, using BitLocker recovery properly, you’re in for a painful, one-by-one repair process,” Directions analyst Wes Miller noted.

3. Don’t overuse Windows in embedded or Long Term Service Channel (LTSC) and infrastructure scenarios. Does your check-in kiosk really need to be running Windows inside? Does your ad billboard? (Looking at you, Times Square!) When an embedded version of Windows goes south, IT likely will be required to fix devices manually. And while restore points exist for many cloud services and virtual machines, they aren’t always or even often there for servers and embedded devices.

4. Remember: You’re at the mercy of your most rapid-fire vendor. You may have sound testing policies and procedures, but do all of your vendors — or all of Microsoft’s ISV partners? You may not be able to influence much or at all with which vendors Microsoft partners, but at least you’ll be more alert for potential issues.

Counting on the CloudStrike incident to cause Microsoft to change its policy which allows certain software vendors access to the Windows kernel is not a good bet. Microsoft execs have said they legally cannot close off Windows the way that Apple does due to a 2009 agreement with the European Commission which requires Microsoft to provide security software companies with the same access to Windows that Microsoft itself gets.

Update (July 26): A July 25 Microsoft blog post entitled “Windows resiliency: Best practices and the path forward,” raises the possibility that Microsoft may, at some point, stop allowing third parties to have Windows kernel access. From that post:

“This incident shows clearly that Windows must prioritize change and innovation in the area of end-to-end resilience. These improvements must go hand in hand with ongoing improvements in security and be in close cooperation with our many partners, who also care deeply about the security of the Windows ecosystem.

“Examples of innovation include the recently announced VBS enclaves, which provide an isolated compute environment that does not require kernel mode drivers to be tamper resistant, and the Microsoft Azure Attestation service, which can help determine boot path security posture. These examples use modern Zero Trust approaches and show what can be done to encourage development practices that do not rely on kernel access. We will continue to develop these capabilities, harden our platform, and do even more to improve the resiliency of the Windows ecosystem, working openly and collaboratively with the broad security community.”

Related Resources

Microsoft: What we’re doing about CrowdStrike

CrowdStrike’s Remediation and Guidance Hub

Microsoft Recovery Tool for Helping Remediate CrowdStrike Issue

Microsoft: Recovery options for Azure VMs affected by CrowdStrike’s Falcon agent

Microsoft’s Customers Must Monitor Its Security Shortcomings (Directions members only)

by
Mary Jo Foley

Mary Jo Foley is the Editor in Chief at Directions on Microsoft. Before joining Directions, Mary Jo has worked as a technology journalist for 40+ years and has focused on... more

Four lessons Microsoft customers can learn from the CrowdStrike meltdown

Four ways enterprise customers can take action

Related Resources

Related Free Content