Tom Olzak

Good Planning Requires Follow-up

In Backup, Business Continuity, Cloud Computing, Disaster Recovery, Security Management, Vendor Management on August 31, 2010 at 07:29

Many organizations still believe that having a great business continuity plan, complete with a solid contract with a third-party recovery partner, is enough to protect them from the inevitable.  As American Eagle Outfitters found, however, this is not enough.

According to Evan Schuman from StorefrontBacktalk.com, which monitors retail Web sites, the outage began with series of server failures.

Schuman, who said he spoke with an unnamed IT source at American Eagle, said a storage drive failed at an IBM off-site hosting facility. That failure was followed by a secondary backup disk drive failure. Once the drives were replaced, the company attempted a restore of about 400GB of data from backup, but the Oracle backup utility failed, possibly as a result of data corruption. Finally, American Eagle Outfitters attempted to restore its data from its disaster recovery site, only to discover the site wasn’t ready and could not get the logs up and running.

“I know they were supposed to have completed it with Oracle Data Guard, but apparently it must have fallen off the priority list in the past few months,” the source told Schuman.

via American Eagle Outfitters learns a painful service provider lesson – CSO Online – Security and Risk.

This is the description of events leading to an eight-day outage for the company’s customer-facing website.  There are one or two lessons for all of us here.

First, when was the last time American Eagle asked IBM for its processes for dealing with unusual outages?  How often did they review IBM’s processes for testing incident response?  This is as much American Eagle’s responsibility as it is the off-site vendor’s.

Second, when was the last time backup tests were performed?  What are the requirements for this in the contract and how is compliance validated?

Based on this article, there’s no evidence that American Eagle was intentionally negligent.  They simply made the mistake of assuming; that is, assuming their service providers were practicing due diligence.  When using cloud or any other third-party services, we still have a responsibility to inspect what we expect.

%d bloggers like this: