How Chinese Airlines Survived the Global IT Outage?

The global Microsoft outage caused by an update grounded most U.S. flights and hit its major airlines hard. Meanwhile, China remained unaffected by the incident, with only some foreign flights delayed due to overseas impacts. A Chinese civil aviation practitioner and commentator highlighted that the reasons are straightforward:
August 12, 2024
author_image
Top picks selected by the China Academy's editorial team from Chinese media, translated and edited to provide better insights into contemporary China.
author_image
A Chinese civil aviation practitioner and industry commentator.
Click Register
Register
Try Premium Member
for Free with a 7-Day Trial
Click Register
Register
Try Premium Member for Free with a 7-Day Trial

On July 19, numerous computers worldwide encountered the blue screen of death (BSOD) and failed to connect system servers. The usual remedy of restarting the system proved ineffective, as the blue screen persisted even after multiple reboots.

The incident caused global system paralysis, with North America being particularly hard hit, leading to server disruptions in social functioning such as flight cancellations, unreachable 911 hotlines, hotels unable to check in guests, canceled surgeries, and stores unable to operate. All this chaos was caused by a previously obscure cybersecurity company, CrowdStrike, which has now become famous.

The root cause of this global incident was a severe compatibility issue between CrowdStrike’s latest software update and the Windows platform. As a leading company in cybersecurity and endpoint security protection, CrowdStrike many companies and cloud servers running its Falcon platforms on Windows.

Consequently, the update caused widespread “blue screen of death” not just on PCs but also in cloud servers, including Microsoft’s Azure, amplifying the incident’s impact in the public domain, with the aviation industry being among the worst affected.

The U.S. was affected most in this incident. Major US airlines, including Delta, American, and United Airlines, called for ground stop orders for all fights, and FAA instructed air traffic controllers to inform pilots about communication issues faced by airlines. Other carriers like JetBlue, Frontier, and Spirit also faced severe impacts, leading to massive flight cancellations due to key systems being down.

Atlanta Airport, the busiest airport in the US and Delta’s hub, was affected most, with over 500 flights canceled, most of which belongs to Delta. Chicago O’Hare Airport canceled nearly 200 flights, and New York LaGuardia Airport saw one-third of its flights canceled. European airports also experienced significant disruptions, with 40% of flights at Amsterdam Airport delayed and one-third of flights at Berlin Airport canceled.

At the Atlanta airport, travelers formed long lines at customer service

Lack of Emergency Response

The most surprising aspect of this incident was the US airlines’ response: grounding all flights immediately. This response seemed bewildering, given the importance of these operational control systems, not just for the airlines but also as a part of critical national transportation infrastructure.

Aviation operation control systems are expected to be highly reliable and resilient, preventing severe disruptions from system failures. The International Civil Aviation Organization (ICAO) mandates specific backup and redundancy requirements for these systems to avoid catastrophic consequences.

However, this incident revealed that US airlines either lacked disaster recovery plans or did not have automatic failover capabilities for critical systems. Even if they had backups, these backups might also have been affected by the incident because they were running on Windows systems impacted by the update. It’s like putting all your eggs in one truck to avoid putting them all in one basket, and then having the truck flip over and catch on fire.

Airlines surely have emergency plans to ensure minimal operations when systems are downgraded or completely unavailable. Even though load control is now done through information systems, every load controller still retains the skill of manually drawing load sheets. If the load control system fails, controllers can refer to the PDF documents corresponding to the aircraft model, print the load sheets, and manually calculate the aircraft’s takeoff data. This manual operation is a fundamental skill, requiring a weekly practice to ensure it can be completed swiftly when needed.

A load and trim sheet for manual operation

Other related departments also have nearly obsessive requirements for emergency drills. For instance, the ground departments in China regularly receive calls from the check-in department, requesting the creation of virtual flights for their emergency drills. These drills simulate scenarios where the operation system crashes, and they must check in passengers and issue boarding passes in local mode, even handwriting boarding passes when printing is not possible.

Therefore, seeing US airlines’ complete operational paralysis during the incident due to the failure of their check-in and load control systems leaves people puzzled, asking: Don’t you practice manual operations regularly? Don’t you have emergency plans? Do you not conduct emergency drills for your emergency plans? Don’t you have backup systems?

Why China Was Unaffected

China’s civil aviation operations remained unaffected by this global incident, with only some foreign flights (like those from American Airlines and United Airlines) experiencing delays due to overseas impacts. The reasons are straightforward:

First, the issue affected Windows systems with CrowdStrike’s security software installed and updated with the faulty patch, causing the infinite blue screen of death. Chinese airlines generally don’t use this security software and are cautious about system updates, often using more stable, older Windows versions.

Second, most Chinese airlines use the TravelSky system, running on Linux, without relying on Microsoft’s Azure cloud or Amazon’s AWS, avoiding the widespread collapse from the faulty update.

The TravelSky system is used by most Chinese airlines

As a crucial system for China’s civil aviation operations, the computer systems and networks operated by TravelSky are classified as “critical information infrastructure” and are one of the eight key systems regulated by the State Council. Except for a few airlines such as Spring Airlines, most other airlines use the TravelSky system. The security and stability of the TravelSky system are of great national importance and are strictly regulated to ensure its stability and reliability.

Of course, this doesn’t mean that the TravelSky system is immune to issues. On August 25, 2020, there was an abnormality in the TravelSky departure system, causing check-in problems at some airports. According to reports, the anomaly occurred at 10:32 AM, leading to check-in issues at some airports, but everything was back to normal by 11:07 AM. Although there was some impact, it lasted only half an hour, so the overall effect was minimal, and operations remained stable.

Despite the decades-old command interface of the TravelSky system being criticized, stability is paramount for critical information infrastructure systems. Having a completely autonomous information system and operating environment has spared China from this global incident, avoiding the embarrassment faced by U.S. airlines. This incident has made people more aware that in the current era, where critical information systems have become essential infrastructure, achieving complete autonomy and control is extremely important. This includes not only information systems but also operating systems. With the increasingly severe network security situation, its necessity is beyond doubt. This is not just a technological choice but a strategic need for national security and industrial development.

References
VIEWS BY

author_image
Top picks selected by the China Academy's editorial team from Chinese media, translated and edited to provide better insights into contemporary China.
author_image
A Chinese civil aviation practitioner and industry commentator.
Share This Post