CrowdStrike says IT problems will take time to fix
- Published
The boss of cyber-security firm CrowdStrike has admitted it could be "some time" before all systems are back up and running after an update from the company triggered a global IT outage.
Experts are warning that it could take days for big organisations to get back to normal.
Although there is now a software fix for the issue, the manual process required will take a huge amount of work, they said.
The global outage has led to thousands of flights being cancelled, while banking, healthcare and shops have all been affected.
The issue was caused when an update from CrowdStrike caused Microsoft systems to "blue screen" and crash.
The problem piece of software was sent out automatically to the firm's customers overnight which is why so many were affected when they came into work on Friday morning.
It meant their computers could not be restarted.
Writing on X, external, CrowdStrike chief executive George Kurtz said: "The issue has been identified, isolated and a fix has been deployed."
In an interview on NBC's Today Show in the US, Mr Kurtz said the company was "deeply sorry for the impact that we've caused to customers".
"Many of the customers are rebooting the system and it's coming up and it'll be operational," he said, but added: "It could be some time for some systems that won't automatically recover."
The fix will not be automatic, but what the industry calls a "fingers on keyboards" solution.
Researcher Kevin Beaumont said: “As systems no longer start, impacted systems will need to be started in ‘Safe Mode’ to remove the faulty update.
"This is incredibly time consuming and will take organisations days to do at scale."
Technical staff will need to go and reboot each and every computer affected, which could be a monumental task.
CrowdStrike is one of the biggest and most trusted brands in cyber-security.
It has about 24,000 customers around the world and protects potentially hundreds of thousands of computers.
In a message, external sent to clients on Friday, Mr Kurtz said the outage was not a security or cyberattack but had been caused by a defect in a "content update".
"As we resolve this incident, you have my commitment to provide full transparency on how this occurred and steps we’re taking to prevent anything like this from happening again," Mr Kurtz wrote.
The description of the problem as a "content update" suggests the overnight update was supposed to be small - not a major refresh of the cyber-security software.
It could have been something as innocuous as the changing of a font or logo on the software design.
That could potentially explain why the software was not as rigorously checked in the same way that a major update would have been. But it also poses the question: how could a small update do so much damage?
One struggling IT manager said the process to get computers back up and running is quick once an IT person is at the machine, but the problem is getting them to the machines.
The person, who wished to remain anonymous, is responsible for 4,000 computers in an education company and said his team were working flat out.
“We have managed to fix all of our servers using the command prompt as a workaround, but for many of our PCs, it's not easy to do manually as we are spread out across five sites. Any PCs that are left switched on overnight are affected and we're rebuilding them,” he said.
IT experts say this manual process will be particularly hard in large organisations with thousands of computers that are potentially under-resourced in IT.
Small and medium-sized businesses without dedicated IT teams or which outsource their IT issues might also struggle.
The larger, more resourced companies, like American Airlines, appear to be fixing the problems rapidly.
Interestingly it looks like many in the US might be less affected as computers that are potentially not yet switched on can be started up to download the corrected software instead of the bad version. But that might still involve a level of manual operation.
Mr Beaumont said that one of the world’s "highest impact IT incidents" was "caused by a cyber-security vendor".
Ironically if a customer was affected by this it was because they followed all the usual advice that is issued by cyber-security experts – install the security updates when you receive them.
While some security companies in the past have accidentally sent out a dodgy software update, we’ve never seen one at this scale and this damaging.
While this incident has caused widespread disruption, the WannaCry cyber-attack in May 2017 was potentially worse.
That was a malicious cyber-attack that affected an old version of Microsoft Windows and spread automatically to any computer that had the old and unprotected Windows software.
It affected an estimated 300,000 computers in 150 different countries.
It hit the NHS for days, affecting doctors' surgeries and hospitals around the country.
In that case it was an attack thought to be carried out by North Korea that got out of hand.
The NotPetya attack a month after that was eerily similar in method and damage.
In contrast, the outages on Friday are a mistake and not an attack.