Tom Olzak

Archive for the ‘Disaster Recovery’ Category

The Internet is Broken, Part III: Response

In Application Security, Business Continuity, Disaster Recovery, Hacking, Log Management, malware, NetFlow, Network Security, Policies and Processes, Risk Management, Security Management, SIEM on January 20, 2013 at 23:12

This is the final post in a series about the broken Internet.  In the first, we looked at SIEM.  Last week, we explored the value of NetFlow analysis.  This week, we close with an overview of incident response.

When evaluating risk, I like to use as reference the following formula:

Basic Risk Formula

Basic Risk Formula

Probability of occurrence, broken into threats x vulnerabilities, helps us determine how likely it is that a specific threat might reach our information resources.  Business impact is a measure of the negative affects if a threat is able to exploit a vulnerability.  The product of Probability of Occurrence and Business Impact is mitigated by the reasonable and appropriate use of administrative, technical, and physical controls.  One such control is a documented and practiced incident response plan.

The purpose of incident response is to mitigate business impact when we detect an exploited vulnerability.  The steps in this process are shown in the following graphic.  Following the detection of an incident (using SIEM, NetFlow, or some other monitoring control), the first step is to contain it before it can spread or cause more business impact.  Containment is easier in a segmented network; segments under attack are quickly segregated from the rest of the network and isolated from external attackers.

Response Process

Response Process

Following containment, the nature of the attack is assessed.  Failing to follow this step can result in incorrectly identifying the threat, the threat agent, the attack vector, or the target.  Missing any of these can make the following steps less effective.

Once we understand the who, what, when, where, how, and why of an attack, we can eradicate it.  Eradication often takes the form of applying a patch, running updated anti-malware, or system or network reconfiguration.  When we’re certain the threat agent is neutralized, we recover all business processes.

Business process restoration requires a documented and up-to-date business continuity/disaster recovery plan.  Some incidents might require server rebuilds.  Business impact increases as a factor of the time required to restore business operation.  Without the right documentation, the restoration time can easily exceed the maximum tolerable downtime: the time a process can be down without causing irreparable harm to the business.

Finally, we perform root cause analysis.  This involves two assessments.  One determines what was supposed to happen during incident response, what actually happened, and how can we improve.  The second assessment targets the attack itself.  We must understand what broken control or process allowed the threat agent to get as far as it did into our network.  Both assessments result in an action plan for remediation and improvement.

The Internet is broken.  We must assume that one or more devices on our network is compromised.  Can you detect anomalous behavior and effectively react to it when the inevitable attack happens?

Give business continuity a chance…

In Business Continuity, Computers and Internet, Disaster Recovery, Risk Management on October 16, 2010 at 11:25

Business continuity is the practice of understanding critical business processes and ensuring their availability.  Disaster recovery is a component of business continuity.
Understanding business processes includes answering the following questions:

  1. What are the manual tasks that support the process?
  2. What are the human and technical resources necessary to enable the process?
  3. What other processes feed data to or receive data from this process?
  4. Is it reasonable and appropriate to build redundancy into the system?
  5. What is the maximum tolerable downtime of the process (how long can the process be broken without causing irreparable harm to the business)?
  6. Based on current capabilities, what is the recovery time if one or more of the components is broken or missing (including processes that feed this process)?
  7. Based on current capabilities, what is the recovery time following a catastrophic event (disaster recovery)?

It takes a group representing a cross-section of the organization to answer these questions.  Note that the planning is around processes, not systems.  Processes are enabled by systems and manual tasks.  For example, questions 4, 6, and 7 should include manual workarounds if automated tasks fail.  (A process is something like processing payroll with expected outcomes including checks for employees, tax payments, etc.)

Once the questions are initially answered, a remediation action plan is created to mitigate risk (shorten recovery time).  Risk mitigation takes two forms: interim and long-term.  Interim mitigation includes workarounds to enable critical outcomes while recovery tasks are performed.

When the action plan is complete, the team should once again answer questions 6 and 7.  If recovery times are not shorter than maximum tolerable downtime, additional remediation steps should be identified.  This cycle repeats until maximum tolerable downtime exceeds recovery time.

Good Planning Requires Follow-up

In Backup, Business Continuity, Cloud Computing, Disaster Recovery, Security Management, Vendor Management on August 31, 2010 at 07:29

Many organizations still believe that having a great business continuity plan, complete with a solid contract with a third-party recovery partner, is enough to protect them from the inevitable.  As American Eagle Outfitters found, however, this is not enough.

According to Evan Schuman from StorefrontBacktalk.com, which monitors retail Web sites, the outage began with series of server failures.

Schuman, who said he spoke with an unnamed IT source at American Eagle, said a storage drive failed at an IBM off-site hosting facility. That failure was followed by a secondary backup disk drive failure. Once the drives were replaced, the company attempted a restore of about 400GB of data from backup, but the Oracle backup utility failed, possibly as a result of data corruption. Finally, American Eagle Outfitters attempted to restore its data from its disaster recovery site, only to discover the site wasn’t ready and could not get the logs up and running.

“I know they were supposed to have completed it with Oracle Data Guard, but apparently it must have fallen off the priority list in the past few months,” the source told Schuman.

via American Eagle Outfitters learns a painful service provider lesson – CSO Online – Security and Risk.

This is the description of events leading to an eight-day outage for the company’s customer-facing website.  There are one or two lessons for all of us here.

First, when was the last time American Eagle asked IBM for its processes for dealing with unusual outages?  How often did they review IBM’s processes for testing incident response?  This is as much American Eagle’s responsibility as it is the off-site vendor’s.

Second, when was the last time backup tests were performed?  What are the requirements for this in the contract and how is compliance validated?

Based on this article, there’s no evidence that American Eagle was intentionally negligent.  They simply made the mistake of assuming; that is, assuming their service providers were practicing due diligence.  When using cloud or any other third-party services, we still have a responsibility to inspect what we expect.

Review of the ioSafe Solo Backup/DR Drive

In Backup, Business Continuity, Data Security, Disaster Recovery, Physical Security, Risk Management on July 4, 2009 at 17:56

I don’t get excited about technology very much anymore.  After almost 30 years in this business, I’ve become rather jaded to most emerging technology.  So I have one thing to say about the ioSafe Solo drive—WOW!!

I received an evaluation unit from ioSafe a couple of days ago.  It came in a plain white box, but it weighed quite a bit.  Big piece of iron I have to spend an afternoon configuring, I thought.  So I waited until the weekend.  Removing the drive from the box I found the drive unit, a USB cable (which closely resembles the cable I use on my USB printer), and a power cable. The drive unit is about the size of a toaster.  But unlike my toaster, it weighs about 15 pounds. 

The manual wasn’t much.  Since I was connecting the drive to my laptop running Windows XP SP2, the installation instructions pretty much consisted of: 1) plug the drive into an outlet, 2) plug the USB cable into the drive and into the computer, and 3) turn on the drive.  This was good.  I like simple.

I followed the directions, and 20 seconds after I turned on the drive I had a new 500 GB drive connected and ready for action.  According to the manual, Apple computer users will have to do some formatting work before they can use the unit.

Now you might be asking, “so what?”  Well, there is more to this drive than meets the eye.  Within 5 minutes of unpacking the gear, I had a backup drive which provides the following:

  • Fire protection for temperatures reaching 1550 degrees Fahrenheit for 30 minutes (tested per the ASTM E119 protocol)
  • Water protection, tested for immersion up to 10 feet for 72 hours
  • FloSafe air cooled, providing forced air cooling through plastic vents which melt shut to protect the unit when ambient temperature reaches 200 degrees Fahrenheit
  • Metal case which can be easily bolted to the floor or secured with a cable lock
  • A three year warranty and ioSafe’s data recovery services for one year

Additional features include 7200 rpm drives and USB 1.0 and 2.0 support, with data transfer rates up to 480 Mb/s.

I was pretty interested in this drive by this time.  It’s a perfect backup solution for my home office and the restaurant we own.  So I looked up the price.  I was not disappointed.  The ioSafe Solo can be ordered with one of three data capacities, as listed below:

  • 500 GB at $149
  • 1 TB at $229
  • 1.5 TB at $299

You can upgrade the data recovery service from one year to up to five years, adding up to $100 to each of the prices listed.  These are retail prices.  A quick look at Amazon.com shows discounted pricing.  If you are an Amazon Prime customer with free shipping, you can also save the $25 or so it takes to get it to your door.

So my Solo unit sits next to my laptop, quietly protecting my data.  Quiet is relative, but it emits a very, very low hum which is almost undetectable in a quiet room and absolutely absent when listening to Slacker.com.  It looks pretty good, too, with blue lights on the front indicating a power on state. 

This is an excellent drive at an affordable price.  If you currently pay monthly fees to support over-the-Web backups, if you still use backup tapes, or if you have simply decided it’s too much trouble to look for and implement the right backup solution, you should definitely take a look at the ioSafe Solo.  I highly recommend it.

AVSIM: Real world example of the value of offsite backups

In Backup, Disaster Recovery, Hacking on May 18, 2009 at 08:00

The owners of AVSIM, an important resource for Microsoft Flight Simulator users, worked for 13 years to build a well respected site.  Using two servers, they conscientiously backed up one to the other, confident they were protected.  That confidence was shattered this month when a hacker destroyed the site, including both servers.  Since no offsite backup–or even an off-server backup–was available, it was impossible to recover.

There is a lesson here for all organizations.  If you have a server or other storage containing critical business information, make sure it is backed up to an offsite location.  Even if the probability is low that fire, tornadoes, hurricanes, and a host of other natural threats may take out your facility, there is always the hacker community which is always looking for a new challenge.

We always talk about the importance of offsite backups, but sometimes it takes an actual example to make managers sign a check.  Maybe that is the proverbial silver lining in this story.

%d bloggers like this: