VMware vSphere® Metro Storage Cluster configurations (referred to as stretched storage clusters or metro storage clusters) are implemented across two datacenters with limited distance to provide benefits like:
- Workload mobility
- Cross-site automated load balancing
- Enhanced downtime avoidance.
Stretched cluster deployment includes: maintaining two datacenter locations; deploying a single storage subsystem spanning both sites; stretching the network across sites. This non-trivial effort can be offloaded to VMware Cloud on AWS which provides a stretched cluster feature across two Amazon Availability Zones. Amazon Availability Zones are different locations in the same geographic area connected through low-latency links. Stretched Cluster for VMware Cloud on AWS is designed to protect against an Amazon Availability Zone failure and provide the benefits listed above. It is based upon vSAN stretched cluster.
In this blog we describe a functional proof-of-concept showing a deployment of S/4 HANA with stretched cluster on VMware Cloud on AWS.
VMware Cloud on AWS is not supported for SAP workloads at this time. Please stay tuned to https://twitter.com/vmwarecloudaws .
The logical architecture of the proof-of-concept is shown below.
The above diagram shows the stretching of an SDDC cluster across two Amazon Availability Zones. If an Availability Zone goes down, it is simply treated as a vSphere HA event and impacted virtual machines are restarted in the other Availability Zone. In addition, vMotion between hosts in a stretched cluster is also available: this feature allows you to live migrate workloads in a cluster that spans two availability zones.
The following vSphere client screenshot shows the Amazon EC2 hosts assigned to the stretched cluster. Three hosts are assigned to each Availability Zone.
The screenshot below shows the cluster hosts are grouped in two vSAN fault domains – each fault domain corresponds to an Amazon Availability Zone.
A multi-tier S/4HANA deployment consists of application and database virtual machines which can be live migrated between any of the six hosts in the stretched ESXi cluster. Small scale tests were conducted to show performance of the application in the following scenarios:
- S/4HANA Database and application server virtual machines on the same site / fault domain.
- S/4HANA Database and application server virtual machines on separate sites / fault domains.
The following metrics were measured:
- Duration of a batch job captured in SAP transaction SM37.
- Round Trip Time (RTT) between the SAP HANA database server and application server. The procedure for this is documented in the SAP article 2081065 – Troubleshooting SAP HANA Network (userid/password required). SAP provides a SQL script that executes on the SAP HANA database and measures the RTT between the database and the clients i.e. application server. The SQL script calculates and displays the RTT values.
The results are shown below.
The results above show there is network latency between the Availability Zones that can impact the application response times in SAP. The SAP batch job executes on the application server and constantly sends SQL requests to the SAP HANA database and retrieves data back – this is impacted by any network latency which adds to the overall duration of the batch job. In the SAP performance analysis transaction ST03 the overall response time is broken down into the application and database tiers – the extra response time due to network latency is added to the database request time.
We can run the application and database server virtual machines of the same SAP system within the same Availability Zone in a stretched cluster scenario to maximize performance. This is achieved with policies and profiles in VMware Cloud on AWS.
Compute policies in VMware Cloud on AWS provide a way to specify how the vSphere Distributed Resource Scheduler (DRS) should place virtual machines on hosts in a resource pool. You can create a VM-Host affinity policy which describes a relationship between a category of virtual machines and a category of hosts. This can be used keep the application and database server virtual machines of the same SAP system on the hosts in the same Availability Zone. More information can be found here.
- Stretched Cluster for VMware Cloud on AWS is designed to protect against an Amazon Availability Zone failure and provide enhanced downtime avoidance.
- A multi-tier SAP system with separate database and application server machines can live migrate between ESXi hosts on different Availability Zones.
- Based on some small-scale tests performance was maximized when the database and application server virtual machines resided on hosts in the same Availability Zone.
- VMware Cloud on AWS has compute policies which can be used to keep the application and database server virtual machines of the same SAP system on hosts in the same Availability Zone.
- Customers should test with their own workloads as mileage is expected to vary.