UPDATE 2: VMware vSphere 7.0 now supports shared (clustered) VMDK! A VMDK can be used as a shared disk resource for a WSFC deployed on a VMs hosting on different ESXi hosts (CaB). Check this document and the guide for more details.
UPDATE 1: Starting with VMware vSAN 6.7 U3 vSAN provides native support for a clustered disk resource for WSFC! Check this article for more information how to configure vSAN for shared disk.
There’s a lot of conflicting materials on the Internet describing how to configure a Windows Failover Cluster (WSFC) on VMware vSphere platform. In this blog post you will learn about VMware supported and recommended configuration options when implementing a WSFC (previously known as Microsoft Service Cluster Service, or MSCS) with disk resources shared across nodes of a cluster. One of the application examples leveraging a WSFC with clustered disks is Microsoft SQL Server configured with Always On Failover Cluster Instance (FCI).
Note: Microsoft SQL Server Always On Availability Group does not require clustered disks between VMs to host a database and therefor no special disk configurations on the vSphere side are needed.
The information provided is applicable to VMware vSphere versions 6.x and 7.x in configurations when the VMs hosting the nodes of a WSFC cluster are located on different ESXi hosts – known as “Cluster-across-box (CAB)” in VMware official documentation. CAB provides high availability (HA) both from the In-guest Operating System (OS) and vSphere environment perspective. We do not recommend a configuration where all VMs hosting nodes of a cluster are placed on a single ESXi host (so called “cluster-in-a-box”, or CIB). The CIB solution should not be used for any production implementations – if a single ESXi host will fail, all cluster nodes will be powered off and, as a result, an application will experience downtime.
We will concentrate on the clustered disk option provided by a Raw Device Mapping (RDM) and will discuss configurations involving VMware Virtual Volumes (VVol) in a separate blog post.
Recommended WSFC Cluster Configurations
The recommended and supported vSphere configuration options for a CAB deployment of a WSFC are shown in the table below:
|vSphere version||Shared disk options||SCSI bus sharing||vSCSI Controller type||vMotion supported|
|vSphere 5.x||RDM physical mode||physical||LSI SAS||NO|
|vSphere 6.0 and 6.5||RDM physical mode||physical||PVSCSI||YES|
|vSphere 6.7||RDM physical mode, vVol||physical||PVSCSI||YES|
|vSphere 7.0||Clustered VMDK, RDM physical mode,
|VMware Cloud on AWS||Clustered VMDK||physical||PVSCSI||YES|
|vSAN (vSphere 6.7 U3)||Clustered VMDK||physical||PVSCSI||YES|
NOTE: We are replacing references to “shared disk” with “clustered disk” to reduce confusion and provide more clarity.
We recommend attaching a disk to be used as a shared resource to a separate vSCSI controller(s) — consider using the PVSCSI Controller Type with vSphere 6.x and 7.0. A SCSI controller used for a clustered disk must be configured with the SCSI Bus Sharing set to physical. The requirement to use physical Bus Sharing is due to SCSI-3 Persistent Reservations used by Microsoft OS to arbitrate an access to a disk shared between nodes.
Block storage must be used to provision a clustered disk for WSFC. We support only Fibre Channel, iSCSI or Fibre Channel over Ethernet (FCoE) as the access storage protocols; file-based storage systems with NFS are not supported. A non-formatted LUN should be made available to the vSphere environment and further assigned as an RDM or vVol device to a VM, node of WSFC. SCSI commands must be directly passed to a LUN to satisfy the requirements of SCSI-3 Persistent Reservations, which justifies the usage of RDMs in physical compatibility mode. All disks should be using the same SCSI IDs across all VMs hosting nodes of WSFC. The figure below depicts the recommended VM level configuration of a disk resource and vController (using vSphere HTML5 Client).
Note: vSphere 7.0 provides support for clustered VMDK. Check this document and the guide for more details.
Live vMotion (both user- or DRS-initiated) of VMs hosting nodes of WFCS is supported starting with vSphere 6.0 and, among others, requires the VM Compatibility (vHardware) to be at least vSphere 6.0 (version 11). Check the information below for more details.
Note: You might see references on the Internet suggesting using the multi-writer feature in conjunction with WSFC. Please be advised that the multi-writer feature must not be used for a clustered disk resource for WSFC.
Both VMware Cloud on AWS and VMware vSAN (6.7 Update 3) support clustered VMDK(s) as a disk resource for WSFC. Support for clustered VMDKs on VMFS for on-premises versions of vSphere is introduced in vSphere 7.0. Check this document and the guide for more details.
While finalizing the configuration of your solution please ensure that the following important additional settings are implemented:
- All pRDMs used with WSFC should be configured as perennially reserved. Follow the kb below for details: ESXi/ESX hosts with visibility to RDM LUNs being used by WSFC nodes with RDMs may take a long time to start or during LUN rescan (Configuring perennially reserved flag)
- DRS anti-affinity rules and groups should be configured to separate the placement of VMs hosting members of WSFC across different ESXi hosts.
- Check the vMotion prerequisites:
- The vMotion network must be using a physical network wire with a transmission speed 10GE (Ten Gigabit Ethernet) and more. vMotion over a 1GE (One Gigabit Ethernet) is not supported
- vMotion is supported for Windows Server 2008 SP2 and above releases. It is not supported on Windows Server 2003.
- The WSFC cluster heartbeat time-out must be modified to allow 10 missed heartbeats at minimum.
- The virtual hardware version for the virtual machine hosting the nodes of WSFC – must be version 11 and later.
The following resources should be consulted while planning, designing and implementing the solution:
Consider the vSphere configuration options listed in this article as the official VMware recommendations to build a supported, highly available, and good performing solution for a Microsoft Windows environment with WSFC clusters with shared disks.