Five ‘Rs’ to successful DR program

June 20, 2018
Chris Munoz

A well-engineered and high-performance disaster recovery (DR) program can save your company in the event of a major incident that takes your critical IT systems offline. Not only will a DR program protect your private and sensitive data and keep your organization up and running when it counts, it can save your reputation with your customers, partners, and vendors.

The tricky part is ensuring your DR program comes through when a major incident happens. Replicating files, applications, and databases, and restoring them at a secondary location, seems like a straightforward process at first glance, but nothing is simple when a crisis strikes.

Success with your DR cannot be a set-it-and-forget-it approach. Your DR program has to be a living program that adapts to your ever-changing business requirements and IT environment. Ensuring that your DR program performs as expected in a crisis requires careful design, documentation, and comprehensive testing.

The process of planning, designing, documenting, and testing your program must include the five R’s of DR:

  • Restoring systems
  • Resuming business functions
  • Remediating data losses
  • Recovering primary sources
  • Returning home

Let’s walk through the role of these pillars of DR programs.

1. Restoring critical IT systems

Your DR program design must include your top tier or most important IT applications and infrastructure, and establish recovery priorities for which environment to restore first in a crisis demanding the optimal RTO and RPO.

All critical infrastructure and applications have a place in your DR program: physical servers, virtual machines, containers, applications, system software, databases, user profiles, security appliances, and so on. Your DR program must determine the order these systems come back up so no dependencies get neglected.

You have to accurately document and then test these systems thoroughly to ensure they will restore as expected in an emergency. Testing should be carefully documented and conducted at least twice a year. Robust testing requires bringing up the secondary environment in a controlled manner, and also ensuring your end users can access the information in the same manner as they would otherwise. This process requires thoughtful planning and execution to avoid disrupting active IT operations.

2. Resuming essential business functions

Limiting downtime is a core goal of successful DR preparedness. The longer your systems remain offline, the greater the cost to your business and your brand’s reputation. The strongest DR programs provide near real-time failover that can prevent the most damaging downtime.

As you design and test your DR program, make sure you’re accounting for business-critical platforms: eCommerce, customer relationship management, enterprise resource planning, and others that are essential to your bottom line.

And don’t forget about your vendors and partners. If you coordinate with third parties when using systems like billing, accounts receivable, and logistics, make sure you collaborate with them when developing your DR program. And don’t forget to include them in your documentation and testing.

3. Remediating data loss

You have to accept that some data might get lost in the event of a major incident such as malicious activity. The key here is reducing the potential damage.

You start by developing recovery time objectives (how fast you want to restore systems) and recovery point objectives (how far in time you want to go back). Then you rank the data in order of importance and design your recovery system to ensure the crown jewels of your data survive regardless of the root cause.

After that, you want to limit potential damage. Extra testing and redundancy combined with data-integrity testing can reduce the risk of data loss. Finally, you need to repair databases and IT systems that suffered damage or corruption during the crisis.

4. Recovering your primary IT resources

Establishing your recovery priorities help ensure that primary IT resources like databases, networks, file systems, and security protocols come back quickly and efficiently.

DR testing proves its worth when you’re recovering primary IT resources. New software versions and firmware upgrades can introduce unexpected problems during the recovery process.

Identifying your most mission-critical IT resources also helps identify non-critical resources. That helps you avoid the temptation to restore everything simultaneously. When time is short during a crisis, you don’t want to waste time restoring resources that aren’t mission-critical.

5. Returning systems to their home state

You cannot afford to neglect this final step. Eventually, you must move out of DR mode and find your “new normal.” Many DR solutions on the market do not provide for this in the short term.  It is critical to get back up to full operational capacity quickly to avoid unhappy customers.

Again, your DR documentation and planning plays a critical role. Your program should identify what success looks like based on your business objectives and provide a timeline for returning your IT operations to their home state.

Pulling it all together

Innovations in cloud technologies enable small to midsize companies to deploy DR programs that would satisfy the most demanding enterprise user.

Though the potential of advanced DR is obvious, making it work can be challenging when your IT team is already strapped for time and resources. These systems need a robust architecture, proper testing, and real-time staffing to put DR plans into action quickly and efficiently.

At CBTS, our Managed Data Protection and Disaster Recovery services can address these challenges and allow your IT team to focus on the strategic initiatives that drive your business forward.

Now that you know the five R’s of disaster recovery, find out more in our eBook: “10 pitfalls to avoid when re-inventing your disaster recovery program.”

Also read this case study: Global aerospace supplier taps CBTS to launch a robust disaster recovery plan while diagnosing issues and upgrading core infrastructure.

Subscribe to our blog