April 24, 2009

Could This Happen to You?

Posted by Brian Bowman

This was an interesting article that came across my desk from a colleague at Progress (thanks Dave C.!) regarding a disaster that has gone largely unnoticed.  I thought it was an interesting example of what you can plan for (or not plan for).

The article talks about a small town in Northern California called Morgan Hill.  The challenge is that this type of man-made disaster could happen anywhere in the US.  How you build a contingency into your Disaster Recovery Plan for this is very difficult.

You could make sure you have ham radio operators on staff, but that seems a bit eccentric.  You could also buy satellite phones for key personnel, but they may not work either (and still need to be charged). 

There are times when you can’t do or plan for everything.  There are, however, things that you can do to help mitigate these types of disasters.  You will not completely eliminate the impact of them, but you can minimize the affect they might have on your business.

The first is to establish a good rapport with the local emergency services personnel.  These include police and fire department.  But you should not limit yourself to those agencies.  You can also connect up with your local CERT (Community Emergency Response Team) organization as well as the local Red Cross.  Communicating with other businesses nearby is also critical to all of your success.  There are organizations like this in almost every country.  There are often times when you can work with other businesses to share resources to ensure that both of you are successful.  This also helps you know and understand what Disaster Recovery plans they may have in place that you may have overlooked.  For example, it would be nice to know that your office is close to their chemical processing plant.

The second thing you can do is ensure that critical systems are available during this type of an outage.  For example, can you run your internal network without an internet connection?  Some companies rely heavily on the internet from the corporate office to the remote branches that they become crippled if the network connection is not there.  Are the remote offices part of the critical business application that needs to function?

Lastly, you can choose to accept the risk that this type of a disaster will cause an outage to you.  As long as you quantify the outage and understand the impact it will have on your business it is acceptable.  This is similar to being self-insured when it comes to insuring your vehicle.

Have you seen or been a part of a local disaster that you would like to share?  I’d love to hear about it!

Until next time – Failure to plan is an option – just not a good one!

Brian B

April 01, 2009

Are you ready for your next Disaster?

Posted by Brian Bowman

When a disaster strikes are you going to be ready?  If you can’t answer this question with 100% surety then you have a challenge in front of you.  There are many different plans to prepare yourself for a disaster.  One that I have seen used both in the public sector in the United States and the private sector is the National Incident Management System (NIMS) and more specifically the Incident Command System (ICS).  These have been developed over the last 20 years to help agencies in the United States respond to a disaster.  There are similar systems available in almost every country and region around the world.

While both of these programs have been developed to respond to a public disaster – typically a natural disaster – they provide a very good foundation for the complete disaster management cycle.

ICS breaks the process down into 4 discreet components:  Mitigation, Preparedness, Response and Recovery.  On PSDN, Marv Stone and I will do a webinar / podcast on these components and how OpenEdge 10 meets these needs for your application.

Mitigation deals with identifying requirements (business and safety for example) and minimizing or avoiding those risks altogether.  An important part of this process is identifying where you are vulnerable.  Once you have identified this you can start to figure out what you “really” need to worry about.  This moves right into the planning component of your Disaster Recovery plan.

Planning addresses what options are available, choosing the appropriate option to meet your needs, implementing that option and testing the plan.  This is one of the most critical components in the process.  It also takes the most time.  It is an iterative process and will never be done…

The next component is Response.  This is execution of the plan.  The first part of this is declaring the disaster.  This would seem like a simple thing to do, but it isn’t.  I will talk about this in my next blog.  This is where your testing with your solution will pay off.  Whether it’s OpenEdge Replication, Failover Clusters, or After-Imaging, if you haven’t tested thoroughly then this could turn into a nightmare.  A critical part of this process is documentation.  Documenting what is being done, who is doing it, and when it was done is critical to the post-mortem (or Recovery) component of your environment.  If you don’t document and learn from your mistakes, you will inevitably make them again.

The last component is Recovery.  This can be interpreted in several different ways.  In the public sector it revolves around returning to life as usual (or as usual as it can possibly be).  In the private sector it involves failing back to your production environment and getting the complete business back up in business as usual mode.  This is also affected by your planning process.  Failing back should be planned and tested in the Planning component of your plan.  Documentation is also critical in this component in order to clean up and prevent additional outage time.

Marv and I will chat about these and where OpenEdge 10 plays a part in all of these components of your Disaster Recovery plan.  Be sure to listen in!

Until next time, a disaster is only a disaster if you are not ready for it.

Brian B

March 09, 2009

A Strong and Stable OpenEdge Environment

Posted by Brian Bowman

In today’s “new economy” you have to make the best of everything you have.  This is even more critical when it comes to making your environment as strong and stable as you can.  Ensuring your environment is running optimally and not speeding towards the cliff is critical to your business doing the best it can.

This is especially important when you are working on Disaster Recovery plans.  Businesses are having a hard enough time making ends meet and staying in business without the additional challenge of losing data or a critical business system.

Disaster Recovery, from my perspective, is a timeline.  Along this timeline is developing your application, testing your application, deploying your application, and managing your application.  This timeline ties very nicely into the four phases defined for Emergency Management in the National Incident Management System (NIMS).  Although this information is for the Federal Emergency Management Agency (FEMA) in the United States, there are similar links for agencies in your region and country.

The four phases of emergency management are Mitigation, Preparedness, Response, and Recovery.  Although Emergency Management deals with man-made and natural disasters, the same concepts can also be applied to application management and Disaster Recovery Planning.

Mitigation is the process of eliminating and minimizing the effects of the disaster.  This is most critical in the development and deployment section of application management. 

Preparedness is the process of planning your DR plan and what the contingency is in the event of a disaster happening.  This is where a business impact analysis (BIA) determines what SHOULD be done to prepare for a disaster.

Response involves who is responsible for what in the event of a disaster striking.  In your business you should know who is going to make decisions, who is going to execute on those decisions, and who is going to DOCUMENT what is going on and what is being done.

Recovery involves returning the business to either partial or full capacity.  This may mean failing back to the production environment, like OpenEdge Replication can do.  It also involves getting your disaster recovery location up and running.  Whether this is utilizing Progress’ technology or something else, recovery requires that the business is running and doing the critical functions needed to stay in business.

It is important to understand that this process is a circular chain.  Disaster Recovery Planning is never done and is an iterative process.
 
Progress is committed to providing you with the tools to make your environment as efficient and safe as possible.  To this end, Progress will soon release OpenEdge Management 10.2A.  This release will include a new tool to help you remotely manage operations – OpenEdge Explorer (OEE). 

OpenEdge Explorer will provide an alternative to Progress Explorer as the configuration and management tool for the OpenEdge environment.  You will no longer have to install a Progress client to be able to graphically configure your OpenEdge environment.  You will also be able to monitor the current status of your systems from a browser anywhere on your intranet.

Additional free online training for Emergency Management and NIMS is available here.

This blog was an interrupt from my normal stream.  I would love to hear what you want to hear about and talk about from the Disaster Recovery world (or Emergency Management, Business Continuity, etc.).

In my next blog I will return to my normal process.

Until then, recovery from a disaster starts with the individual – are you ready?

Brian B

February 17, 2009

What Should I Document in my Disaster Recovery Plan?

Posted by Brian Bowman


What do you document?  Some people document everything.  I have seen Disaster Recovery (DR) plans that are 250 pages long.  Reading the plan, let alone recovering from it, would require 3 or 4 hours before the implementation even starts.  Documentation does not have to be huge and hard to do.  Starting with a basic plan and then updating it as you work through the process is the most critical.     

First and foremost, the most important part of the documentation for a DR plan is step by step instructions on what to do and how to do it.  In the best of circumstances, you will have your top database administrator (or your only DBA) there to do the process.  In the worst of circumstances you will have “Sam the Security Guard” doing the recovery.  This is important to take into account when documenting the plan.  If you have nothing else, get this done!!! I visited an insurance company in Connecticut to talk about DR.  Their base level documentation fit on 1 page of paper.

All of the information needed to recover the database and application should be explained in such a way that anyone in the company could perform the recovery.  This makes the process much more challenging.

Next you need to document which applications are critical to the business running and which ones can wait.  I will go into more detail on this in a future blog.

You documentation should be printed out and stored safely AWAY from the computer room and the production building.  I have visited customers in the past that have well documented, thorough, “pretty” plans that failed to get outside of the PC in the computer room.

Other things that you should document in your plan include:

  - Internal contact information (including cell and home phone numbers) for everyone that could, would, or should be involved in the recovery effort.
  - Copies of contracts with all of your 1st, 2nd and 3rd level vendors.  For example, what is your relationship with your telephone company?  What is their guarantee to you for availability (this is your Service Level Agreement or SLA with them)?  What is your hardware DR plan?  Is your software support and DR plan the same as your hardware?
  - External contact information – is there someone specific at your DR site that you need to contact?  Do you just call the front desk and ask for Joe?  What if Joe is not there?  What is the process to get the ball rolling?
  - Specifications for all the critical resources – If you are failing over to a data center that is shared with others the last thing you want is to be in a position of having a generic system waiting for you that does not fit the OS/Java/HW requirements for the application to run.
  - Who pushes the button?  Who makes the decision to go to backup or failover to the DR site?  Document this and have a chain of command for who is in charge if all of the executives are out of town.  The Incident Command System (ICS) from FEMA is an excellent example of how to design your DR organization.  More importantly, get the executive team to commit – in writing – that the people that have been identified have the authority to make the decision to fail over.

I once participated in a ½ day DR drill at a DR conference.  If you ever get the chance to do this you should.   If it is run well, it is very insightful and fun.  You will also learn a lot about people’s ability to NOT make a decision.  The room was filled with 70 Business Continuity planners.  There were 7 tables representing different functions within the fictitious company (in this case a chemical manufacturing plant).  Each function had 10 people that had to act as one.  The hardest part of the exercise was getting the Executive table to declare a disaster and start the process.  Their inability to make the decision cost the company almost a day of time.  These were people who should have known better!  When they were given the responsibility of declaring a disaster they were dumbstruck with fear.  The moral of the story is getting someone in charge that can pull the trigger if needed.

There are many other things that could go into a plan.  This is, by no means, the complete list.  For more information and ideas on what to document - Google the idea.  Here are some links I have found insightful (or at a minimum, get you thinking) about what you should have in your plan.

Disaster Resource Nuggets 

Welcome - Bureau of Emergency Management, Division of Emergency Services, Communication and Management (You will have a local version of this)

Data center disaster recovery planning software

National and State Disaster Preparedness Resources and Tools

The BCI Good Practice Guidelines


When do you update your documentation?  You do this every time something changes in your systems.  This sounds daunting, but if your application can’t recover because it is missing a small part of the application that is new then you don’t have a DR plan.

I look forward to hearing what your documentation for your DR plan looks like!   I’m always looking for plans and different ideas.  I will share what I receive from customers as well.

In my next blog I will talk about who is in charge. 

Until then, if you can’t document it, you can’t measure it.  If you can’t measure it then you don’t have it! 

Brian B

February 05, 2009

Is Disaster Recovery Really Important to Small-Medium Businesses?

Posted by Brian Bowman

In my last blog I talked about how Disaster Recovery (DR) crosses all vertical markets and is important no matter what your company does.  In this blog I will “cut the pie” a different way and talk about the size of your company and whether or not Disaster Recovery and Business Continuity is important to your business – regardless of the size.

Business Continuity (BC) has been at the forefront of most enterprise companies for a long time.They have the ability to allocate resources to the issues involved with creating and maintaining a complete Business Continuity Plan (BCP).  Many enterprises have a complete department that focus on all aspects of business continuity.  These can be everything from personnel issues and HR announcements to public relations.

A newer trend in the “C” level management has been to create a new position often call the Chief Risk Officer (CRO).  Although Chief Risk Officers have existed for a long time in the financial market they are new to other verticals (such as manufacturing).

Enterprise companies have also assigned Business Continuity and Disaster Recovery to specific individuals OUTSIDE of Information Technology to handle the bigger picture.  These individuals usually have education and training specifically focused on Business Continuity.

This is a luxury that enterprise companies have – the ability to support a resource to own and manage the Business Continuity and Disaster Recovery plans for the company.  Just because the Small-Medium Business can’t fund this type of resource doesn’t make them immune to disasters.

Small-Medium Businesses typically don’t have the luxury (and resources) to be able to create and maintain a Business Continuity Department.This doesn’t mean that Business Continuity isn’t less important. In fact, it is more important to this size business.  A smaller company does not have the resources to weather an event or disasters as well as most enterprises can.

An event that causes an outage for a Small-Medium Business will have a much larger impact on sales that most enterprise customers. Take the recent disaster that happened in southern NH as an example. Nashua is home to 175,000 residents and more than 700 local businesses.

Southern NH experienced an ice storm in December 2008 that left over 200,000 homes without power for up to 12 days.  In this scenario, a small local company ($10 million annual revenue) loses a day’s worth of business it roughly equates to $33,333 ($10 million / 300 days).  A week’s worth of income is $233,331 (or 2.3% of their annual income).  This means that the whole company will be out of work for a week.  An enterprise will have additional sites outside of the region that will be able to cover the outage.  One day’s loss to an enterprise company ($1 billion annual revenue) would be $3.3 million.   this will be reduced by the regional diversity of the company.  The 7 day outage would roughly represent 0.46% of their income.

It is because of this that Business Continuity and Disaster Recovery are more important to the small to medium size business. The task of a Disaster Recovery plan falls to the Information Technology (IT) for many reasons.  They have access to all of the departments’ critical data. They also know how the applications interact dependencies between them. This makes IT ideal to help build the Disaster Recovery Plan – especially in smaller organizations.

A complete Disaster Recovery Plan does not have to be an epic novel. The more complex your environment the larger your plan will be. Having a simple, easy to use instruction manual is critical. Planning and testing is the key to a successfully implemented Disaster Recovery Plan. It should be well documented and clearly identify what, where, when, why and especially who.

Progress Software recognizes that this can be a hard thing for businesses to perform. This is why we have products like Name Server Load Balancing, OpenEdge Management, OpenEdge Replication, and Actional.These help to both proactively prevent disasters as well as allow you to recover and run your business despite the disaster.  They don’t require weeks or months to learn and implement and they make your life easier to manage.

Unfortunately, there is nothing we can do about the weather, yet…In my next blog I will talk about documenting your Disaster Recovery Plan.  Until then, if your DR plan isn’t broken then you probably haven’t tested it lately…

Brian B

January 05, 2009

The Importance of Disaster Recovery in Your Market

Posted by Brian Bowman

   Disaster Recovery (DR) has been a hot topic in many different verticals for a long time.  Progress Software deals with all of these verticals.  Loss means destruction of data as well as theft.  In a recent survey 75% of respondents have experienced a hardware or software outage within the last year (Symantec DR FIndings  2008).  We can categorize companies by their vertical market and by their size.  I will focus first on the business vertical aspect.  

   Every vertical that you can imagine needs Disaster Recovery.  I will use some verticals as examples of ends of the Disaster Recovery maturity spectrum.   Some institutions within these verticals are farther along than others.  Unfortunately I am forced to use these generalities to be able to discuss different concepts.

   Financial verticals (stock markets, mortgage companies, banks, insurance companies, etc) place a very high value on their data.  This vertical has a head start over most other verticals due to the nature of the business and more regulation and requirements around loss of data.  For this reason, they have developed much more strict standards and requirements, both voluntary as well as regulatory, that help them build and maintain their Disaster Recovery plan.

   Other verticals tend to be a bit more relaxed when it comes to Disaster Recovery planning.  Manufacturing has lagged behind due to the nature of the business and more stress being placed on production of goods versus loss of data (and dollars).

   Retail falls between these other two vertical markets.  Plans are a bit more formal, but are not as stringent as the financial vertical market.  Credit card information theft has brought this to the forefront and requirements such as PCI DSS (Payment Card Industry Data Security Standards) have forced the retail sector to deal with Disaster Recovery, security, and keeping data safe.

   No matter what your vertical, if your company expects to stay in business after a disaster then you must have some type of a plan.  The plan must include the complete application recovery environment.  Just recovering the database is less than useless unless you have the application to support it.

   What is your experience in your vertical?  Do you see your industry moving towards Disaster Recovery planning being critical to the business?  I’ve love to hear what you see in your business.

In my next entry I will talk about Disaster Recovery based on your company size. 

Until then, if you can’t recover then don’t let it fail.

Brian B.

Progress Software
Progress Software