February 17, 2009

What Should I Document in my Disaster Recovery Plan?

Posted by Brian Bowman


What do you document?  Some people document everything.  I have seen Disaster Recovery (DR) plans that are 250 pages long.  Reading the plan, let alone recovering from it, would require 3 or 4 hours before the implementation even starts.  Documentation does not have to be huge and hard to do.  Starting with a basic plan and then updating it as you work through the process is the most critical.     

First and foremost, the most important part of the documentation for a DR plan is step by step instructions on what to do and how to do it.  In the best of circumstances, you will have your top database administrator (or your only DBA) there to do the process.  In the worst of circumstances you will have “Sam the Security Guard” doing the recovery.  This is important to take into account when documenting the plan.  If you have nothing else, get this done!!! I visited an insurance company in Connecticut to talk about DR.  Their base level documentation fit on 1 page of paper.

All of the information needed to recover the database and application should be explained in such a way that anyone in the company could perform the recovery.  This makes the process much more challenging.

Next you need to document which applications are critical to the business running and which ones can wait.  I will go into more detail on this in a future blog.

You documentation should be printed out and stored safely AWAY from the computer room and the production building.  I have visited customers in the past that have well documented, thorough, “pretty” plans that failed to get outside of the PC in the computer room.

Other things that you should document in your plan include:

  - Internal contact information (including cell and home phone numbers) for everyone that could, would, or should be involved in the recovery effort.
  - Copies of contracts with all of your 1st, 2nd and 3rd level vendors.  For example, what is your relationship with your telephone company?  What is their guarantee to you for availability (this is your Service Level Agreement or SLA with them)?  What is your hardware DR plan?  Is your software support and DR plan the same as your hardware?
  - External contact information – is there someone specific at your DR site that you need to contact?  Do you just call the front desk and ask for Joe?  What if Joe is not there?  What is the process to get the ball rolling?
  - Specifications for all the critical resources – If you are failing over to a data center that is shared with others the last thing you want is to be in a position of having a generic system waiting for you that does not fit the OS/Java/HW requirements for the application to run.
  - Who pushes the button?  Who makes the decision to go to backup or failover to the DR site?  Document this and have a chain of command for who is in charge if all of the executives are out of town.  The Incident Command System (ICS) from FEMA is an excellent example of how to design your DR organization.  More importantly, get the executive team to commit – in writing – that the people that have been identified have the authority to make the decision to fail over.

I once participated in a ½ day DR drill at a DR conference.  If you ever get the chance to do this you should.   If it is run well, it is very insightful and fun.  You will also learn a lot about people’s ability to NOT make a decision.  The room was filled with 70 Business Continuity planners.  There were 7 tables representing different functions within the fictitious company (in this case a chemical manufacturing plant).  Each function had 10 people that had to act as one.  The hardest part of the exercise was getting the Executive table to declare a disaster and start the process.  Their inability to make the decision cost the company almost a day of time.  These were people who should have known better!  When they were given the responsibility of declaring a disaster they were dumbstruck with fear.  The moral of the story is getting someone in charge that can pull the trigger if needed.

There are many other things that could go into a plan.  This is, by no means, the complete list.  For more information and ideas on what to document - Google the idea.  Here are some links I have found insightful (or at a minimum, get you thinking) about what you should have in your plan.

Disaster Resource Nuggets 

Welcome - Bureau of Emergency Management, Division of Emergency Services, Communication and Management (You will have a local version of this)

Data center disaster recovery planning software

National and State Disaster Preparedness Resources and Tools

The BCI Good Practice Guidelines


When do you update your documentation?  You do this every time something changes in your systems.  This sounds daunting, but if your application can’t recover because it is missing a small part of the application that is new then you don’t have a DR plan.

I look forward to hearing what your documentation for your DR plan looks like!   I’m always looking for plans and different ideas.  I will share what I receive from customers as well.

In my next blog I will talk about who is in charge. 

Until then, if you can’t document it, you can’t measure it.  If you can’t measure it then you don’t have it! 

Brian B

February 05, 2009

Is Disaster Recovery Really Important to Small-Medium Businesses?

Posted by Brian Bowman

In my last blog I talked about how Disaster Recovery (DR) crosses all vertical markets and is important no matter what your company does.  In this blog I will “cut the pie” a different way and talk about the size of your company and whether or not Disaster Recovery and Business Continuity is important to your business – regardless of the size.

Business Continuity (BC) has been at the forefront of most enterprise companies for a long time.They have the ability to allocate resources to the issues involved with creating and maintaining a complete Business Continuity Plan (BCP).  Many enterprises have a complete department that focus on all aspects of business continuity.  These can be everything from personnel issues and HR announcements to public relations.

A newer trend in the “C” level management has been to create a new position often call the Chief Risk Officer (CRO).  Although Chief Risk Officers have existed for a long time in the financial market they are new to other verticals (such as manufacturing).

Enterprise companies have also assigned Business Continuity and Disaster Recovery to specific individuals OUTSIDE of Information Technology to handle the bigger picture.  These individuals usually have education and training specifically focused on Business Continuity.

This is a luxury that enterprise companies have – the ability to support a resource to own and manage the Business Continuity and Disaster Recovery plans for the company.  Just because the Small-Medium Business can’t fund this type of resource doesn’t make them immune to disasters.

Small-Medium Businesses typically don’t have the luxury (and resources) to be able to create and maintain a Business Continuity Department.This doesn’t mean that Business Continuity isn’t less important. In fact, it is more important to this size business.  A smaller company does not have the resources to weather an event or disasters as well as most enterprises can.

An event that causes an outage for a Small-Medium Business will have a much larger impact on sales that most enterprise customers. Take the recent disaster that happened in southern NH as an example. Nashua is home to 175,000 residents and more than 700 local businesses.

Southern NH experienced an ice storm in December 2008 that left over 200,000 homes without power for up to 12 days.  In this scenario, a small local company ($10 million annual revenue) loses a day’s worth of business it roughly equates to $33,333 ($10 million / 300 days).  A week’s worth of income is $233,331 (or 2.3% of their annual income).  This means that the whole company will be out of work for a week.  An enterprise will have additional sites outside of the region that will be able to cover the outage.  One day’s loss to an enterprise company ($1 billion annual revenue) would be $3.3 million.   this will be reduced by the regional diversity of the company.  The 7 day outage would roughly represent 0.46% of their income.

It is because of this that Business Continuity and Disaster Recovery are more important to the small to medium size business. The task of a Disaster Recovery plan falls to the Information Technology (IT) for many reasons.  They have access to all of the departments’ critical data. They also know how the applications interact dependencies between them. This makes IT ideal to help build the Disaster Recovery Plan – especially in smaller organizations.

A complete Disaster Recovery Plan does not have to be an epic novel. The more complex your environment the larger your plan will be. Having a simple, easy to use instruction manual is critical. Planning and testing is the key to a successfully implemented Disaster Recovery Plan. It should be well documented and clearly identify what, where, when, why and especially who.

Progress Software recognizes that this can be a hard thing for businesses to perform. This is why we have products like Name Server Load Balancing, OpenEdge Management, OpenEdge Replication, and Actional.These help to both proactively prevent disasters as well as allow you to recover and run your business despite the disaster.  They don’t require weeks or months to learn and implement and they make your life easier to manage.

Unfortunately, there is nothing we can do about the weather, yet…In my next blog I will talk about documenting your Disaster Recovery Plan.  Until then, if your DR plan isn’t broken then you probably haven’t tested it lately…

Brian B