The power outage at Delta Airlines this week took down computers and stranded thousands of travelers from New York to Tokyo. It was just the latest in a series of computer-related issues that have disrupted other U.S. airlines.
Delta said it canceled about 900 flights Monday out of a total of 6,000 worldwide. Nearly 800 were canceled Tuesday and hundreds more were delayed. By Wednesday most flights had resumed, although Delta said it expected about 100 more cancellations.
Delta is investigating the outage, which hit the airline’s computer systems early Monday in Atlanta, Georgia. The company said after power was lost, “some critical systems and network equipment” didn’t switch to backup as they were supposed to.
In a video message posted online, Delta CEO Ed Bastian apologized to passengers for the shutdown. "I’m sorry that it happened and I don’t have the final analysis (on) what caused the outage."
Bastian noted that over the last three years, Delta had spent "hundreds of millions of dollars" on technology upgrades. He said the upgrades included backup system improvements intended to prevent what happened this week.
Three weeks ago, Southwest Airlines experienced a similar outage and the company canceled 2,300 flights over four days. The airline said in that incident, a computer network router failed at its Dallas data center and backup systems did not switch on.
JetBlue computers went out several times this year due to power and other issues. The outages caused widespread flight delays and forced employees to check in passengers without computers.
The operations center that lost power monitors all of Delta’s worldwide flights. It also keeps track of individual planes, crew and passengers, and operates sales and ticketing.
Dr. Ahmed Abdelghany is an Associate Professor of Operations Management at Embry-Riddle Aeronautical University in Daytona Beach, Florida.
He said computer outages are significant because they affect all airline operations and require a complete “reset” of the system.
"As an airline, you are required to rebuild as if you are rebuilding your schedule of the flights, the crew and the ground personnel. And you have to do that in no time."
Abdelghany, who worked for several years as a senior technology analyst at United Airlines, said the fact that Delta’s backup systems failed is a big concern.
"Of course it is a big problem if your backup system also didn’t work. This is like an unforgivable mistake. You cannot afford that you say I have a backup system and then when you need it, it doesn’t work."
Martin Libicki is a professor at the U.S. Naval Academy and an adjunct professor at the Pardee RAND Graduate School. He specializes in information technology and security issues. He said one of the main reasons backup systems fail is because they are not regularly tested.
"And the reason they don’t work is because people think, well they're backups, they’ll be there when I need it. And they don’t test it, and when they’ need it, it doesn’t come on. If you are going to have a reliable backup system, you have to go to it all the time, just to make sure that it runs all the time."
Libicki said one possible cause of backup failures in both the Delta and Southwest outages could have been a malware infection. This is what investigators determined caused a huge power outage in the Northeastern United States in 2003. The malware infection in that case affected a power company’s outage warning system.
"That creates a possibility that there is a piece of malware out there - which somehow interferes with the transition from backup to main system - that prevented in both cases backup from coming to the rescue when the main system went down."
Professor Abdelghany said Delta did a good job informing its customers about the outage and trying to accommodate their needs. But he thinks all the airlines should do more to explain what steps they are taking to avoid future outages.
"We need to hear more about what they are doing to prevent those incidents from happening in the first place."
I’m Bryan Lynn.
Words in This Story
disrupt – v. to interrupt by causing a disturbance
switch – v.
router – n. a device that forwards data to the proper parts of a computer network
monitor – v. to watch, observe, listen to, or check (something) for a special purpose over a period of time
significant – adj. large enough to be noticed or have an effect
schedule – n. a list of the times when buses, trains, airplanes, etc., leave or arrive
personnel – n. the people who work for a particular company or organization
analyst – n. a person who studies something; an expert
afford – v. to be able to do (something) without having problems or being seriously harmed
adjunct – adj. added to a teaching staff for only a short time or in a lower position than other staff
regularly – adv. at the same time every day, week, month, etc. : on a regular basis
reliable – adj. able to be trusted to do or provide what is needed : able to be relied on
malware – n. software that is intended to damage or disable computers and computer systems
determine – v. to officially decide (something) especially because of evidence or facts : to establish (something) exactly or with authority
transition – n. a change from one state or condition to another
accommodate – n. to provide what is needed or wanted for (someone or something)