Lessons Learned from the CrowdStrike Tech Outage

4 min read
July 22, 2024 at 3:00 PM

The recent global outage caused by CrowdStrike's faulty update has highlighted crucial lessons for organizations worldwide. By examining the factors that led to this disruption and understanding how to mitigate similar risks, organizations can enhance their resilience and ensure continuity in the face of unexpected challenges.

Who is CrowdStrike?

CrowdStrike is a global leader in cybersecurity known for its cutting-edge threat intelligence and endpoint protection solutions. Founded in 2011, the company has made a significant impact by preventing breaches for numerous large enterprises and public sector organizations. CrowdStrike’s flagship product, Falcon, is a cloud-native endpoint protection platform that combines next-generation antivirus, endpoint detection and response (EDR), and a 24/7 managed threat hunting service called Falcon OverWatch. The company has been involved in several high-profile cybersecurity investigations, including the Sony Pictures hack and the Democratic National Committee breach, positioning itself as a prominent player in the cybersecurity space.

The CrowdStrike Outage: What Happened?

On July 19, 2024, a defect in a content update for CrowdStrike’s Falcon sensor caused a global IT crisis. The update, intended to mitigate a new Windows threat, inadvertently led to widespread system crashes, resulting in the infamous Blue Screen of Death (BSOD) on millions of devices worldwide. The issue affected systems running Falcon sensor for Windows 7.11 and above.

Within 79 minutes, CrowdStrike deployed a fix, but the damage was already done. Critical systems in banks, airports, healthcare facilities, and other sectors were disrupted, leading to missed flights, closed call centers, and canceled surgeries. Microsoft's estimates indicated that approximately 8.5 million devices were affected. The recovery process involved complex steps for IT administrators, including using tools like Microsoft's newly released Windows PE recovery tool and the "Recover from Safe Mode" tool.

Preparing for the Next “CrowdStrike” Outage

The CrowdStrike incident underscores the importance of proactive vendor and third-party management. Organizations must adopt comprehensive strategies to mitigate risks associated with their supply chain and vendor relationships. Here is how you can achieve that:

Conduct Thorough Business Impact Analysis (BIA)

A BIA helps identify critical business functions and the impact of their disruption. It highlights potential single points of failure and allows organizations to develop strategies to mitigate these risks. The recent CrowdStrike incident is a stark reminder of the importance of a thorough BIA. A detailed analysis could highlight the dependency on EDR solutions and the catastrophic impact of their failure, prompting the development of robust contingency plans. Reassess your BIAs in light of the recent incident. Ensure they comprehensively cover all potential risks, including those posed by critical third-party solutions. Identify any new single points of failure and develop strategies to address them.

Develop and Maintain Comprehensive Playbooks

Having detailed playbooks for various disaster scenarios is crucial. While the specific scenario of an EDR solution crashing systems may not come up in regular tabletop exercises, it is essential to consider and plan for all possible contingencies. Playbooks should include steps for immediate response, containment, and recovery. They should be regularly updated and tested to ensure they remain effective. Revise your incident response and business continuity playbooks to incorporate lessons learned from the CrowdStrike incident.

Carry Out Regular Tabletop Exercises

Regularly conducting tabletop exercises helps prepare for unexpected incidents. These exercises should simulate various scenarios, including those that seem unlikely, such as the failure of critical cybersecurity solutions. The recent incident with CrowdStrike demonstrates that even leading cybersecurity solutions can fail, and organizations must be prepared to respond swiftly and effectively. Conduct tabletop exercises based on this scenario to test your response plans and identify any gaps or weaknesses. Ensure all stakeholders are familiar with their roles and responsibilities during an incident.

Strengthen Vendor Management Practices

Enhance your vendor management practices by conducting regular audits and assessments of your vendors’ security posture. Ensure they have robust incident response and business continuity plans in place. Establish clear communication channels for timely updates during incidents.

Mitigating the Risk You Pose to Your Customers

As you work to improve your own preparedness, it is crucial to ensure that your actions do not negatively impact your customers. Here are some best practices to follow:

Transparent Communication

Maintain transparency with your customers regarding your security measures and any incidents that may affect them. Open and honest communication builds trust and helps manage customer expectations during crises.

Continuous Monitoring and Improvement

Continuously monitor your systems and processes to identify and address potential vulnerabilities. Regularly review and update your security measures to stay ahead of emerging threats. Demonstrating a commitment to continuous improvement reassures customers that their security is a top priority.

Rigorous Patch Vetting Process

One of the critical lessons from the CrowdStrike incident is the importance of thoroughly vetting patches before deployment. Implement a rigorous patch management process that includes extensive testing in a controlled environment to identify potential issues before they impact your customers. Use a phased rollout approach, starting with a small subset of systems to monitor for any adverse effects before a broader deployment. Additionally, ensure there are robust rollback procedures in place should any issues arise. By prioritizing meticulous patch vetting, you can significantly reduce the risk of disruptive incidents and maintain the trust of your customers.

Partner with Experts

Consider partnering with experts in incident response, business continuity planning, and vendor management to enhance your preparedness. Companies like Compass IT Compliance offer specialized services to help organizations develop robust plans and strategies.

Conclusion

The CrowdStrike incident serves as a wake-up call for organizations to be proactive in managing their vendors and third-party solutions. Conducting thorough business impact analyses, developing comprehensive playbooks, and regularly testing your response plans are critical steps in ensuring resilience against unexpected disruptions. Use this incident as a learning opportunity to strengthen your preparedness and avoid becoming a point of failure to your own customers.

At Compass IT Compliance, we specialize in comprehensive incident response and business continuity planning services tailored to meet the unique needs of your organization. Our expertise extends to developing robust vendor onboarding and risk assessment programs, ensuring your third-party relationships are secure and resilient. We conduct thorough business impact analyses to identify potential vulnerabilities and create effective strategies for mitigating risks. Our dedicated team of experts will work with you to design and implement customized solutions that safeguard your operations and ensure business continuity amidst unexpected challenges. Contact us today to discover how our services can enhance your preparedness and protect your organization from future disruptions.

Contact Us

Get Email Notifications

No Comments Yet

Let us know what you think