Introduction
Contents
A deployment of a Windows Server Failover Cluster (WSFC) on top of a highly available and resilience virtual infrastructure should never be done without a purpose and these purposes are tightly bound to Mission Critical or Business Critical Applications. Such applications require not only highly available hardware, but also protection from software failures (a service not able to start or other Guest OS corruptions) and minimal downtime during maintenances (such as patching of the operating system or rolling over a new version of the application). While planning, deploying, and operating a WSFC it’s vital to ensure that the WSFC configuration is error prone and has full vendor support.
It’s a lot of resources depicting how to build a WSFC on VMware vSphere and particularly on VMware Cloud on AWS, however it’s often underestimated the importance of the WSFC Validate a Configuration Wizard (Validation Wizard) to create a supported WSFC.
In this blog we will discuss how to prepare VMware vsphere Virtual Machines (VMs), execute tests, interpret the results and get to the supported configuration. While all the categories of the wizard are equally important, we will pay most attention to the storage category as we have seen a lot of confusion and questions around shared storage from our customers.
Preparing VMs to host a WSFC.
Before you get to the point where you would be able to start the Validation Wizard, you should invest some time in preparing the environment. The steps below are just basic steps, please refer to the documentation for more details.
- Minimum of two VMs with exact the same virtual hardware configuration.
- Windows OS deployed and updated.
- Failover Cluster role deployed.
- Shared storage presented to all VMs participating in the cluster. To present a shared storage:
- On the first VM:
- Create a new virtual SCSI controller of the type VMware Paravirtual (PVSCSI).
Note: Never attach a shared disk to the SCSI controller 0 or other virtual controller hosting the boot disk of your VM!
-
- Set SCSI bus sharing for this new SCSI controller to physical.
- Add a new VMDK disk from the Workload datastore.
- Attach the VMDK to the newly created controller. DO NOT use the “multi-writer” flag.
- Set the Disk Mode to Independent-persistent.
- On all consecutive VMs:
- Create a new virtual SCSI controller of the type VMware Paravirtual (PVSCSI).
- Set SCSI bus sharing for this new SCSI controller to physical.
- Add an Existing hard disk by choosing the VMDK created on the first VM.
- Set the Disk Mode to Independent-persistent.
To validate your cluster, it would be sufficient to provision a single, small VMDK (for example, 1GB). You can use this disk after the creation of the cluster as the quorum. You are not required to have all the disks presented at the validation time.
Note: If during Power On of the second VMs you encounter an error “File System specific implementation of OpenFile[file] failed …”, you should check if the shared disk is attached to the New created SCSI controller and/or if the SCSI bus sharing for this controller is configured as physical.
After shared disk(s) is/are attached to all VMs, ensure that the Windows OS can see the disk(s) in the Disk Management mmc. Initially, a shared disk is recognized as offline and non-initialized.
If you would run the Validation Wizard at this stage, and no other disks except of the boot disk are available, the Validation Wizard would skip the storage category marking it as “Non applicable”.
Drilling down to the Storage Category reveals the reason: WSFC is not able to work with RAW disks.
You should Initialize the disk and bring it online.
This step should be done from one VM only (remember, at this point of time, WSFC does NOT control the access to the disk and you can possibly corrupt the disk by accessing it from other nodes). On all other nodes the shared disk should be visible as Initialized and Offline:
It’s not required to create a file system and assign a drive letter for the disk selected for the validation. However, doing so would not break the wizard and you still would be able to validate the disk.
Running the Validation Wizard
Now you are ready to run the Validation Wizard. To do so open the Failover Cluster Manager and select Validate Configuration under Management and follow the instructions.
Upon competition of the wizard, you are presented with the results, displayed in the basic html format in the browser of your choice. The report is stored by default in C:Users<%UserName%>AppDataLocalTemp. It’s recommended to save the *.htm file separately: you might need this report if you would need to contact the vendor support later.
Analyzing the results
Let us dig into the results. If you see any category showing Error in the Description column, WSFC would not allow you to create a cluster – you must fix all the errors in red before moving forward. Warnings are different: some of them could be tolerated, some of them must be fixed before you proceed with the cluster creation.
While it’s a lot of warnings that you can possibly face, let us discuss two of them which are expected on VMware Cloud on AWS and would not revoke the support.
Storage – Validate Storage Spaces Persistent Reservation
With the introduction of Storage Spaces, Microsoft added new checks to validate Storage Spaces requirements. These tests are only valid for Storage Spaces and have no impact on “normal” shared disks. VMware Cloud on AWS does not provide support for PERSISTENT RESERVE OUT Register (00h) persistent reservation commands and Storage Spaces is not a valid configuration for WSFC on VMware Cloud on AWS. However as mentioned on Microsoft Techcommunity: “if you are not using Storage Spaces with this cluster (such as on a SAN) then this test is not applicable and you can ignore any results of the “Validate Storage Spaces Persistent Reservation” test including this warning”.
This warning is not applicable to a WSFC deployed on VMware Cloud on AWS and can be safely ignored.
Network – Validate Network Communication
Navigating to this category brings up the text highlighted in yellow
It’s a lot of confusion around this subcategory when deploying a WSFC on a virtual environment. While absolutely valid if nodes of WSFC are physical, this check is not applicable to VMs. In virtualized environments vNICs of a VM use physical NICs of your ESXi host to communicate with clients and across nodes. Adding more vNICs or separating WSFC heartbeats to a separate vNIC would not improve the network availability and might just complicate the configuration. You can safely ignore this warning.
Any other warnings not described above should be fixed before deploying a production instance of a WSFC.
Additional Considerations
While troubleshooting possible issues with the storage, pay close attention to the subcategories List Disks and List Disks To Be Validated in the Storage category. The way how the Validation Wizard numerates disks is different from the Windows OS and VM configuration.
List Disks subcategory reflects the order and numbering of disks as seen in the Disk Management mmc:
However, the List Disks To Be Validated subcategory assigns “Test Disk 0” to the first shared disk. In our example, the Test Disk 0 corresponds to the Disk Number # 1 from the List Disks table. All further references in the storage category are using disk numbers as shown under List Disks To Be Validated. Do not mix Test disk 0 with the boot disk 0!
Summary
This blog highlights the importance of the WSFC validation and provides recommendations on how to proper prepare shared storage and test you VMs hosting a WSFC. Make sure to follow the recommendations outlined in WSFC on VMware vSphere Deployment Guide and VMware Cloud on AWS.
Happy Clustering in the New Year!