We're an ISO27001:2013 Certified Supplier

Disaster Recovery

Every business should have a Disaster Recovery (DR) Plan. The process of writing one will force you to do the thinking in advance of it being needed if the business experiences a major problem.

It doesn’t need to be a huge document. For most businesses, from an IT perspective at least, a few pages of A4 will be significantly better than nothing.

What’s a Disaster?

A disaster may be any unplanned event that seriously disrupts the flow of your business. Some examples:

  • Your online accounts system or Customer Relationship Management or document storage system is unavailable and the provider isn’t sure when it will be back (or the provider goes out of business).
  • Your main office is flooded or experiences a fire.
  • The data centre hosting your servers has a major outage.

None of these is very likely, but the impact of any of them is likely to be significant.

What’s Recovery?

Recovery, in the context of a DR plan, is the state where the business can continue its day to day operations. It may not be perfect – for example, it may involve temporary office accommodation – but it is survivable.

What’s a Disaster Recovery Plan?

A DR plan describes how your business goes from Disaster to Recovery.

It should be a written document, printed out. Yes, it may be maintained online, but given the situations that a DR plan addresses, there are no guarantees that it will be accessible online when needed. A printed copy should be kept by key personnel at home.

The scope of Disaster Recovery extends far beyond IT – you may no longer have any offices – and the requirements of a robust Disaster Recovery policy will vary enormously between Companies. From an IT perspective, the following steps are worthy of consideration.

Step 1: Locations

What locations does the business rely on? This isn’t just the offices, warehouses, labs, etc; it’s also the data centres where your servers are hosted and possibly even employees’ homes if they work from home. For each location, assess the impact of the loss of that location to the continuity of the business.

Step 2: Business Critical Applications

What are the business critical applications? Create a list of applications together with a brief description, where the application is (ie, which of the locations identified earlier) and the priority of the application.

The priority is used to determine which of multiple unavailable applications should be recovered first. For example, the accounts system is important to a business, but most businesses could survive a week without it. The applications that deliver value to customers may be of a higher priority. The critical applications might include email, Customer Relationship Manager (CRM) and the order delivery system.

Online (“cloud”) systems should not be forgotten. If your CRM provider, accounting system or file storage is offline for an extended period, how will you use the backups that you have of that data?

Step 3: Identify Business Critical Data

What data is critical to the business? That may include the Operations Manual, accounts data and the order book. Where are the backups for each of those kept, and what is the procedure to recover that data from the backup?

If you have your own server room, it’s not unusual to have a backup service of some kind in that room, and to ensure that backups are carried out both locally and to a remote location. At Tiger Computing, the rule is that business critical data must be backed up to at least two locations that are remote from the source of the data.

Step 4: List Your Servers

For each server the business owns, record its name, function, location and the priority for restoring it. In the event of a disaster, the priority column will determine which servers need to be recovered first.

Step 5: Action List Per Location

For each of the business locations, list what steps are to be taken in the event of that location being unavailable. That will include making each of the applications identified in Step 2 as being at this location available in a different location. Where will that be? What is needed to make that application available? Does any related server identified in Step 4 need to be rebuilt, or can the application be hosted some other way temporarily?

Step 6: Key Contact Data

List the contact details of key personnel, including home phone number, personal mobile number, personal email addresses (the business infrastructure may not be available), together with contact details for key locations (eg, data centres).

The output should be a written, printed document that is kept at home by key business personnel. It should be reviewed and revised at least annually – you may well be surprised at how out of date it can become after just a year (confession: I was when I reviewed ours).

Sample Disaster Recovery Plan

Introduction

This document outlines the immediate actions that would be taken by XYZ Corporation in the event of the loss, or severe reduction in functionality, of a business location used by XYZ Corporation.

This document does not cover the full process of recovery to normal operations; rather, it details the actions necessary to:

  • get to a point whereby XYZ Corporation can fulfil its contractual obligations to its clients; and
  • ensure that the critical business functions can continue

This document does not discuss the simultaneous non-availability of multiple locations. Each of the locations are multiple tens of miles apart, and thus it is unlikely that one incident would render multiple locations unusable.

Locations

There are three locations that are critical to the business, and each of these is dealt with below.

Head Office, New Town

In addition to providing work space for all employees, Head Office also hosts one server. Key employees, identified as senior technical staff and the directors of the business, regularly work from home; other employees (administrative and sales roles) would be able to function from home in terms of assisting the business return to normal.

The impact of loss of this location to the functioning of the business is assessed as Medium.

Bytes Data Centre, Otherville

This is the principal data centre used by the business, and hosts servers owned and managed by the business.

The impact of loss of this location to the functioning of the business is assessed as Medium to High.

Bits Data Centre, Rivertown

This location hosts a single server, and that server provides secondary or backup functionality.

The impact of loss of this location to the functioning of the business is assessed as Low.

Other Locations

In addition to the locations identified above, some employees regularly work from home. However, there is no business necessity for them to do so, so the consequence of the loss of an employee’s home would be limited to reduced availability of that employee.

Business Critical Applications

The following applications are critical to the business and would need to be made available as soon as practical following a disaster:

ApplicationDescriptionLocationPriority
WikiInternal documentation wikiBytes Data CentreHigh
NextcloudDistributed Data systemBytes Data CentreMedium
Staff VPNAllow remote workingBytes Data CentreMedium
Minute Buy MinuteTime tracking systemBytes Data CentreMedium
PiesupContract databaseBytes Data CentreLow
OnionBusiness accountsHead OfficeLow
CRMCRMCloud service from Acme CRM LtdMedium

Business Critical Data

The following data is critical to the business and must be backed up to two locations other than the location of the server that normally holds such data. Those backups are to be monitored in the usual way.

The CRM backups are in CSV format, and may be loaded into a spreadsheet as a temporary measure.

DataServerLocationRemote backup 1Remote backup 2
WikiericBDCgibsonfender
NextcloudericBDCgibsonfender
accountsronnieHead Officegibsonmartin
piesupfreddyBDCgibsonfender
CRMSaaS-gibsonfender

Disaster Recovery Actions

Immediate Actions

These are the immediate actions to be taken in the context of the business.

  • Ensure personal safety
  • Minimise further damage
  • Notify the Managing Director
  • If available, create a ticket in internal logging system to track progress
  • Do not talk to the Press

It is likely that a face to face meeting, or at least a teleconference, will be convened to discuss specific recovery actions. The remainder of this section of this document details the approach to recovery.

Head Office

In general, the business data functions of Head Office will be implemented on servers at Bytes Data Centre. The specific servers to be used are not mandated here, although some suggestions are made, so judgement will be needed at the time.

  1. Reconfigure incoming xyz.com email to be delivered on a server at Bytes Data Centre (likely candidate: lenny).
  2. Initiate a recovery from backup of IMAP (mail) data to the server identified above.
  3. Initiate a restore from backup of the ticketing system onto an appropriate server.
  4. Ensure web and mail access are functional.
  5. Restore from backup the virtualised server that runs the accounts system.

Bytes Data Centre

The Bytes Data Centre Centre (BDC) is the most critical location, hosting a number of XYZ Corporation servers.

In the event of the loss of BDC, remote technical staff will be required to work from Head Office to:

  • allow recovery actions to be co-ordinated
  • facilitate strategy meetings
  • remove the immediate requirement for remote working facilities as the Staff VPN will be unavailable

XYZ Corporation Servers

NameFunctionPriorityNotes
alphaApplication X serverHigh
bravoWikiMedium
charlieStaff VPN concentratorMediumNot relevant if data centre unavailable
deltaWeb serverMedium
fenderBackup serverMediumAll data also backed up to gibson
gibsonBackup serverMediumAll data also backed up to fender
zuluApplication Z serverHigh

 

Recovery Actions

It is not the role of this document to define how the services should be reinstated, but rather to define the importance of each, as given above, together with options for their recovery. Those options include:

  • Using server oscar
  • Using server papa
  • Using Virtual Machines
  • Using rented dedicated servers

Bits Data Centre

The single server at Bits Data Centre provides the following services:

  • Secondary MX
  • Secondary DNS
  • Secondary backups
  • Secondary NTP

None of those services is business critical, but if they are unavailable, the infrastructure is no longer resilient.

In principle, the recovery process will be:

  • Acquire a replacement server (or reuse an existing spare if possible)
  • Identify a replacement data centre. Note: Do not use Bytes Data Centre.
  • Configure and install server.

Emergency Contact Details

Fred Smith

  • 01xxx xxxxxx (home)
  • 07xx xxx xxx (mobile)
  • fred@mymail.com (private email)

Susan Jones

  • 01xxx xxxxxx (home)
  • 07xx xxx xxx (mobile)
  • sue@privmail.com (private email)

Bytes Data Centre

  • 01xxx xxx xxx (office)
  • 01xxx xxx xxx (24 hour emergency number)
  • Address: xxxxxxxx

Bits Data Centre

  • 01xxx xxx xxx (office)
  • 01xxx xxx xxx (24 hour emergency number)
  • Address: xxxxxxxx

Could This Article Be Improved?

Let us know in the comments below.

Secure. Reliable. Scalable.

If that doesn't describe your current Linux systems, check out our FREE Linux Survival Guide to help you get your systems up to scratch today!

  • This field is for validation purposes and should be left unchanged.