American Eagle Outfitters learns a painful service provider lesson
A recent eight-day outage at online retailer American Eagle Outfitters showed that even the best disaster recovery plan isn't foolproof -- particularly when it's outsourced.
By Lucas Mearian
August 30, 2010 — Computerworld — As American Eagle Outfitters learned in July, even if you do everything right to ensure you have disaster recovery and business continuity plans in place, Murphy's Law sometimes takes over. And problems can be compounded if you rely on an outsourcer for disaster recovery services.
The multibillion-dollar clothing retailer suffered an eight-day Web site outage because its Oracle backup utility failed -- and then an IBM disaster recovery site wasn't up and running as it should have been, according to a report from StorefrontBacktalk.com .
IBM did not respond to requests for comment on the outage. American Eagle did not dispute StorefrontBacktalk.com's basic account of what happened, though a spokeswoman said a few details about the incident were incorrect.
According to Evan Schuman from StorefrontBacktalk.com, which monitors retail Web sites, the outage began with series of server failures.
Schuman, who said he spoke with an unnamed IT source at American Eagle, said a storage drive failed at an IBM off-site hosting facility. That failure was followed by a secondary backup disk drive failure. Once the drives were replaced, the company attempted a restore of about 400GB of data from backup, but the Oracle backup utility failed, possibly as a result of data corruption. Finally, American Eagle Outfitters attempted to restore its data from its disaster recovery site, only to discover the site wasn't ready and could not get the logs up and running.
"I know they were supposed to have completed it with Oracle Data Guard, but apparently it must have fallen off the priority list in the past few months," the source told Schuman.
In an e-mail response to questions from Computerworld, a spokeswoman for American Eagle Outfitters said StorefrontBacktalk.com was "off track" by saying the retailer should have directed Web traffic to its mobile Web site. That's because the mobile site was also down.
"Second, despite the slant of some reports, we worked closely with IBM in the spirit of partnership to resolve the issue as quickly as possible for our customers," she said.
The outage raises a question for companies wondering about their disaster recovery plans: Should IT staffers be assigned to periodically audit a service provider and perform recovery drills? Experts say yes.
"You should never give up ownership or responsibility or governance of what's going on with your data to your service provider partners," said Roberta Witty, an analyst at research firm Gartner Inc. "It's still your data. It requires a fair amount of due diligence to make sure third-party service providers have all the processes and procedures in place to ensure they meet your recovery needs."
More Salted Hash with Bill Brenner