CrowdStrike outage highlights critical need for disaster planning
The recent global outage experienced by cybersecurity firm CrowdStrike has resonated widely in the IT and business communities.
The incident, described as both a technical and business problem, underscores the importance of disaster preparedness and business continuity planning.
Shane Maher, Managing Director of managed services and cyber expert Intelliworx, highlighted the significance of handling such situations adeptly. "This shows why disaster preparedness is so important. And it's not just about security, it's more about disaster recovery and handling the situation. There are so many people affected by this outage. It's not just a technical problem, it's a business problem," Maher stated. He further emphasized that businesses must have contingency plans for these events and communicate transparently with their customers and stakeholders.
Forrester analysts have provided detailed insights into the technical challenges and procedural responses surrounding the CrowdStrike incident. Andras Cser, a Vice President and Principal Analyst at Forrester, explained the recovery complexities due to the nature of the update deployment. "Recovery options for affected machines are manual and thus limited: administrators must attach a physical keyboard to each affected system, boot into Safe Mode, remove the compromised CrowdStrike update, and then reboot," Cser said. This issue is compounded by some administrators' inability to access BitLocker hard drive encryption keys, as noted by Cser, thus complicating the remediation process. He advised administrators to track CrowdStrike's guidance through official channels.
The necessity for manual interventions extends to potentially hundreds or thousands of affected machines, according to Allie Mellen, another Principal Analyst at Forrester. The reliability of cybersecurity tools is paramount, and incidents like these inevitably bring about executive concerns regarding the robustness of enterprise systems. "An incident like this questions that reliability. This will undoubtedly raise questions and concerns from executives about how to ensure the reliability of enterprise systems, especially with technology as integrated into day-to-day operations as cybersecurity software," Mellen remarked. She stressed the importance of providing teams with the necessary resources for efficient issue resolution.
Forrester also outlined a series of best practices to mitigate the impact of such incidents and bolster future resilience. Among the immediate actions, they recommend empowering authorised system administrators to rectify problems swiftly, maintaining clear communication both internally and externally, and adhering to vendor guidance. They also stress the importance of taking care of the team members tasked with addressing the crisis.
Once the immediate issue has been resolved, Forrester advises technology and security leaders to implement infrastructure automation to manage software rollouts better, refresh and rehearse IT outage response plans, and secure comprehensive written warranties from security vendors regarding their quality assurance processes and threat detection capabilities.
Looking further ahead, Forrester suggests reevaluating third-party risk strategies and making use of contract provisions as risk mitigation tools. These measures are aimed at enhancing an organisation's ability to withstand and recover from similar disruptions in the future.
The CrowdStrike incident serves as a timely reminder of the precariousness of digital infrastructures and the profound impact that technical glitches can have on business operations. As businesses increasingly rely on digital tools and platforms, the need for robust disaster preparedness and recovery plans becomes ever more critical. The expert insights from Intelliworx and Forrester underscore the multifaceted nature of such challenges and the comprehensive strategies required to navigate them successfully.