Disaster Recovery for Virtualized Business Critical Applications (Part 2 of 3)

A protection group is a group of virtual machines that fail over together to the recovery site. Protection groups contain virtual machines whose data has been replicated by array-based replication or by VR. Typically contains virtual machines that are related in some way such as:

  • A three-tier application (application server, database server, Web server)
  • Virtual machines whose virtual machine disk files are part of the same datastore group.


Figure 5: Protection Groups

In our use case we have different protection groups as shown below. The composition of the SAP Application protection group is shown.


Figure 6: SAP Application Protection Group

Individual protection groups for Exchange, Infrastructure, Oracle and SQL were created as shown.

Protection groups are the building blocks of recovery plans. A protection group can be included in multiple recovery plans. A recovery plan is a sequence of steps executed to recover virtual machines in a specified sequence and priority.


Figure 7: Recovery plans for different applications and for all.

Each individual application can have its own recovery plan and can be recovered independent of others. Most applications have dependencies on certain infrastructure components such as Active Directory and DNS. The infrastructure protection group needs to be included in most application recovery plans to ensure that the application is usable after recovery. A recovery plan for the entire site (All) can include all applications. The recovery plan provides the capability to prioritize different applications to create a specified order for the recovery process.

The recovery plan for the SAP application is shown below. SAP application requires the Oracle Database in addition to the application servers and hence the two protection groups are included in the plan.


Figure 8: Recovery Plan for the SAP Application

 IP addressing at the recovery site:

The recovery site can have the option to use the same IP address as the primary or completely different IP addresses. This is dependent on the customer infrastructure.

  • Some customers leverage capabilities such as stretched VLANs or re-locatable VLANS to have the same IP address in the primary and the recovery site.
  • It is quite common to have a completely different set of IP addresses at the recovery site for the virtual machines. SRM recovery plans provide the capabilities to automatically re-ip the virtual machines before recovery

The screenshot below shows an example IP mapping for the SAP Oracle VM at the recovery site.


Figure 9: IP Customization during recovery

Site recovery helps automate the cumbersome process of disaster recovery planning, testing and recovery. It provides the workflow, run books and automation to make the process of recovery seamless.

SRM typically addresses the following use cases:

  1. Disaster Recovery
    1. Testing
    2. Recovery from actual disasters
    3. Reprotect
  2. Planned Migration

 Disaster Recovery with SRM:

One of the major advantages of SRM is the capability to test disaster recovery of business critical applications with no impact to the running production applications. This is a big differentiator for SRM compared to DR for legacy physical environments. SRM provides ability to test recovery in an isolated environment without any downtime and helps tune the recovery and learn from the tests. When an actual disaster happens there are very little surprises and the RTO is truly optimized. The process for testing and actual recovery are identical with the only difference being in the case of an actual disaster the recovery happens for real with the primary site being down. A test recovery happens in isolation with the primary site being up.

The environment that was setup for business critical applications was tested with SRM. Disaster Recovery tests are executed by running recovery plans that had been setup for the different applications.  The screenshot below shows the testing process for the SAP recovery plan. For a real recovery the Recovery use case will be run instead of the test use case. It can be seen that there are multiple steps in the recovery. Some steps are done in parallel, while others are done based on priority. This provides optimization of the time of recovery.


Figure 10: SAP recovery with SRM

Once the recovery is complete, reports are available to view all the steps and their duration. These history reports are invaluable for the tuning of the recovery process.


Figure 11: History report for SAP recovery

After testing SRM makes it seamless to cleanup up the test environment using the Cleanup function. The temporary storage, virtual machine and networking used for the tests are cleaned in a matter of a few minutes.

If it were an actual disaster and a recovery, instead of the cleanup a reprotect process would be initiated once the primary site infrastructure is available. The reprotect process reverses the direction of replication and the roles of the primary and recovery sites.

End of Part 2       

Access Part 3 here.