Kafka on VMware vSphere with Kubernetes: Deploying the Confluent Operator

This article describes a set of work that was done at VMware’s labs with Confluent staff to demonstrate deployment of the full Confluent Platform, using the Confluent Operator, on VMware vSphere 7 with Kubernetes.  The Confluent Platform is a collection of processes, including the Kafka brokers and others that provide cluster robustness, management and scalability. The Confluent Operator allows the various components of the Confluent Platform to be easily installed and then managed throughout their lifetime. In this case a Tanzu Kubernetes Grid (TKG) cluster operating on vSphere 7 was used as the infrastructure for this work. A Kubernetes Operator is a widely accepted way to deploy an application or application platform, particularly those that are made up of several communicating parts such as this one. Operators are popular in the Kubernetes world today and a catalog of different application operators, based on Helm Charts, is held in the Tanzu Application Catalog.

Source: The Confluent Platform web page

Apache Kafka is an open source software platform that is used very widely in the industry for inter-application messaging and event streaming at high speed and scale for many modern applications. Many use cases of Kafka are applications that perform machine learning training or inference and analytics tasks on streaming data. One example of this kind of application is seen in the IoT analytics benchmark environment that was developed at VMware. The image above from the Confluent site shows a general view of the world that Kafka operates in, forming the glue between different applications and between those applications and the outside world. It is used by most of the large internet companies as well as by many thousands of smaller companies to handle streams of live data as they flow into and out of their applications to others. Confluent is the go-to company for Kafka-based  event streaming. The Confluent staff worked closely with VMware personnel on proving the Confluent Platform and the Operator on vSphere 7.

As is the case with most Kubernetes operators, a YAML specification file comes with the downloadable Confluent operator package that makes it easier to deploy and manage all the Kafka components (event brokers, service registry, zookeepers, control center, etc.). Once deployed, the operator and the Kafka runtime (i.e. the Confluent Platform, which has several components) are hosted in pods on a set of virtual machines running on VMware vSphere 7 with Kubernetes.

The version of the Confluent Operator used for this work was V0.65.1. The downloadable tar bundle for the Operator is located on the Deploying Confluent Operator and Confluent Platform web page.

Configure a TKG Cluster on vSphere 7

Generally, the VMware system administrator creates TKG clusters for the developer/testing/production community to use. They can do this once the Kubernetes functionality is enabled on vSphere 7 and vCenter 7. This blog article does not go into detail on the steps taken by the system administrator to enable the vSphere Kubernetes environment (also called enabling “Workload Management” within vCenter). Those details are given in a separate document, the vSphere with Kubernetes QuickStart Guide.

The commands below are executed on a VM that was created by the system administrator with the appropriate access rights to your newly created TKG cluster. The vSphere with Kubernetes supervisor cluster is the management and control center for several TKG or “guest” clusters, each created for a developer or test engineer’s work by the VMware administrator.

The administrator logs in to the Supervisor cluster on vSphere via the Kubernetes API-server for that cluster. An example of the command to login to the API Server for the Supervisor cluster is shown here.

The VMware administrator creates, among other items, a Storage Class and a NameSpace within the vSphere Supervisor Cluster to contain the new TKG cluster (as shown in the YAML below for the creation of that TKG cluster). The Kubernetes namespace here maps to a vSphere resource pool that allows administrators to control the CPU/memory resources that are used by the new TKG cluster, if we need to. Each development/testing project can operate in a namespace of its own, with its own contained TKG cluster.

Administrators create a TKG Cluster with a YAML specification file that follows the pattern of the example below. The specific edits the VMware system administrator would make are to the “class” and “storageClass” items here.

NOTE: If you have difficulty with viewing the text below, click inside the box – you should then see a menu at the top bar area of the box and choose the <> icon to see the text in a correctly formatted way.

Note that the “class” attribute is set to “guaranteed-large” above. This parameter configures the size of the Worker VMs that are used to contain the pods. Certain components/services of the Confluent Platform, such as the SchemaRegistry service, require larger memory/core configuration and this is specified in that class parameter when creating the TKG cluster.

Deploy the Confluent Operator on the vSphere TKG Cluster

In our lab testing work, we used a Linux-based VM with the Kubectl vSphere Plugin installed in it – this we referred to as our TKG “cluster access” VM. Having received instructions on using the cluster access VM from the VMware system administrator, we were ready to start work on our new TKG cluster.

Once logged in to a cluster access VM, the first step is to use the “tar” command to extract the contents of Confluent Operator tar file you downloaded from the Confluent site to a known directory. You may need to unzip the file first.

We made a small set of changes to the supplied “private.yaml” file within the “providers” directory under the Confluent Operator main directory. These were customizations to make it specific to our setup. The version of the private.yaml file that we used for our testing is given in full in Appendix 1.

Note: Exercise care with the spacing if you choose to copy some contents from this file – the Helm tool and other YAML tools are very picky about the layout and spacing!

Each named section of the private.yaml file (such as those named “kafka” and “zookeeper”) describes the deployment details for a Kubernetes service. There are Kubernetes pods deployed to support those services, which we will see later on.

Set Up the Helm Tool

We installed the Helm tool into the TKG cluster access VM. There are instructions for this on the Deploying Confluent Operator and Confluent Platform web page. The instructions below are taken from that Confluent web page, with some added guidelines for vSphere with Kubernetes.

Login to the TKG Cluster

Once logged into the cluster access virtual machine,  first login to the Kubernetes cluster and set your context.

You choose your Kubernetes namespace to operate in, as directed by your system administrator. The namespace is part of the context you operate in. In our case, we made use of the “default” namespace for some of earlier testing work. Later on, as the testing got going, we formalized this to use more meaningful namespace names.

Installing the Confluent Operator

The “private.yaml” file mentioned in the various helm commands below contains the specifications for the various Confluent components. It is copied from the template private.yaml that is given in the Confluent Operator’s “helm/providers” directory. To install the various components of the Confluent Platform/Kafka on vSphere with Kubernetes, we used an edited “private.yaml” cluster specification that is fully listed  in Appendix 1. The private.yaml file required certain site-specific parameters for a particular implementation, such as DNS names for kafka brokers, for example. Those edited parts are shown in bold font in the listing of the contents in Appendix 1.

Install the Operator Service Using Helm

Apply a Set of Kubernetes Parameters

The following instructions allow certain actions on behalf of the vSphere user that are necessary for a clean installation. They complement the Confluent installation instructions given at the web page above.

The Zookeeper Service

Before installing this service, there are some pre-requisites required at the vSphere and Kubernetes levels. Firstly in the vSphere Client, create a tag that maps to a storage policy named “zookeeper-standard-ssd-myzones”. and attach it to the target namespace. This named storage class is needed so that the storage provisioner, csi.vsphere.vmware.com, mentioned in the global section of the private.yaml file can assign a storage volume to enable Persistent Volumes Claims (PVCs) to be used at the Kubernetes level. The Zookeeper service requires PVC support to allow it to work. Then in your TKG cluster access VM, issue the command below.

The Persistent Volume Claims may be seen in Kubernetes by issuing

and after all the services in this list were deployed, we saw them as shown here

The Kafka Service

See the Note below on certain edits that are required to the Kafka service LoadBalancer, once it is deployed.

The LoadBalancer’s “externalTrafficPolicy”  defaults to “local” instead of “cluster”.  The former is not supported on the current TKG release. We need to update each loadbalancer entry within the Kafka service from “local” to “cluster” for the traffic policy attribute. The details on this are described below.

The SchemaRegistry Service

The Connectors Service

The Replicator Service

The ControlCenter Service

The KSQL Service

NOTE: Once all of the Kubernetes services listed above that make up the Confluent Platform are successfully deployed, then we edit the Kafka service to change certain characteristics of the LoadBalancer section for that service. This is needed for vSphere compatibility. Use the following command to do that

In the “spec:” section add a new line or change the existing line to read as follows:

The same procedure will need to be done to the kafka-1-lb and kafka-2-lb Kubernetes services. These are the three Kafka LoadBalancer services.

An example of the contents of that edit session is shown here:

NOTE: If you have difficulty with viewing the text below, click inside the box – you should then see a menu at the top bar area of the box and choose the <> icon to see the text in a correctly formatted way.

vSphere Client View

Once the Confluent Operator is deployed on vSphere with Kubernetes, you can view the VMs in the vSphere Client tool that each map to the Node functionality in Kubernetes. Here is an image of that setup taken from a later test experiment we did with a different namespace name. Each Kubernetes node is implemented as a VM on vSphere. For the reader who is not familiar with the vSphere Client, each VM has a name prefaced with “confluent-gc-md” in the view below and is a node in Kubernetes. Each VM handles multiple Kubernetes pods just as a node does, as needed.

Users familiar with vSphere Client screens will notice the new “Namespaces” managed object within the vSphere Client left navigation. This was designed specifically for managing Kubernetes from vSphere. The VMs properties can be viewed in the vSphere Client UI that is familiar to every VMware administrator, shown below. These VMs may be subject to vMotion across host servers and DRS rules as they apply to the situation.

The vSphere Client view of host servers and VMs with vSphere 7 with Kubernetes

To see these nodes in Kubernetes, issue the command as shown below.

As you can see above, all of the VMs that host the Kubernetes pods are themselves running on a set of ESXi host servers, in our case six machines named “esxi-aa-01.vmware.demo” etc., above. Each host server runs multiple virtual machines concurrently, as is normal in VMware.

Testing the Confluent Operator Installation

Follow the instructions at the Confluent Operator and Platform website to ensure that the Operator and Platform installations are functioning correctly.


There are separate sections described there for intra-Kafka cluster tests and for testing event handling from consumers and producers that are outside of the Kafka cluster. We focus here on the latter, as that is the more comprehensive test of the two. We performed the intra-cluster test before we proceeded to the external testing – and all functionality was correct.

We created a separate Confluent Platform installation that executed independently of our Kubernetes-based Confluent Operator setup, so as to test the TKG cluster deployment. We use the “external” qualifier here to refer to that second Confluent Platform instance.

In our tests, we deployed the external Confluent Platform cluster within the guest operating system in our cluster access VM, which we used earlier to access  the Kafka brokers in the Kubernetes-managed cluster. The external Confluent Platform cluster could also be deployed in several different VMs if more scalability were required. We were executing a Kafka functionality health-check only here.

The key parts of the testing process mentioned above were to:

  1. Install the Confluent Platform in one or more VMs, as shown above.
  2. Bring up an external Confluent Platform cluster separately from the Kubernetes Operator-based one. This external Confluent Platform/Kafka cluster is configured to run using a “kafka.properties” file. The contents of that file are given in Appendix 3 below. Note that in the kafka.properties file the name of the bootstrap server is that of the Kubernetes-controlled Confluent platform instance. The bootstrap server identifies the access point to the Kafka brokers and load balancers.
  3. The name of the bootstrap server will need to be set up in the local DNS entries, so that it can be reached by the consuming components.
  4. In a Kafka consumer-side VM, first set the PATH variable to include the “bin” directory for the external Confluent platform. Then run the command:

This command starts a set of processes for the external Confluent Platform cluster. Note that we did not have to start our Kubernetes Kafka cluster in this way – that was taken care of for us by the Confluent Operator.

The starting status of each component of that new external Kafka cluster is shown.

Once the external Confluent Platform is up and running, invoke the following command to bring up a Kafka consumer-side process. This process listens for any messages appearing in the topic “example” that is present at the server “kafka.cpbu.lab” on port 9092.

The following producer process can be run in the same VM as the message consumer, or in a separate VM. In a message producer-side VM , first set the PATH variable to include the “bin” directory for the Confluent platform. Once that is done, invoke the command

The “seq” command from the Confluent Platform produces a series of messages and sends them to the Kubernetes-controlled Kafka brokers at “kafka.cpbu.lab:9092”. The combination of the above two processes, (i.e. consumer and producer) running simultaneously, shows a flow of 1 million messages being transported by the Kubernetes-controlled Kafka brokers from the Kafka producer side to the Kafka consumer side. We can change the number of messages sent and received at will.

The individual messages that are handled and passed on by the Kafka brokers may be viewed in the Confluent Control Center Console. This was viewed in our tests at:   (user:admin, p: Developer1)

Here is one view of that Confluent Control Center console, showing messages on an example topic  flowing from producer to consumer via the Kafka brokers.

The IP address and port number for the Control Center Console is given in the list of Kubernetes services shown in Appendix 2 below on the line describing the service named as follows:

Appendix 1: The Edited Helm/Providers/private.yaml file from the Confluent Operator Fileset

Entries in the private.yaml file used for Operator installation that required edits to be done to customize the contents for this validation test are shown in red font below.

NOTE: If you have difficulty with viewing the text below, click inside the box – you should then see a menu at the top bar area of the box and choose the <> icon to see the text in a correctly formatted way.


Appendix 2 –  Checking the Status of all Confluent Components in a TKG Cluster

Using the cluster access VM with the TKG cluster, issue the kubectl command below. We then see output that is similar to that below. This output shows a healthy set of Kubernetes pods, services, deployments, replicasets, statefulsets and other objects from the Confluent Platform, as deployed and running under the Operator’s control.

Multiple pods are running within each Kubernetes node. Those pods are seen in the output from the command shown here along with the nodes (VMs) that host them:



Appendix 3 : The Properties File for the External Confluent Platform Kafka Cluster Used for Testing

External testing of a Confluent Platform deployment is described in more detail in the Installation page for the Confluent Operator: https://docs.confluent.io/current/installation/operator/co-deployment.html

A file named “kafka.properties” is used in the testing of the Kubernetes-based Kafka Brokers. This is done from a second Confluent Platform deployment that is external to the original Kubernetes-based Kafka cluster we deployed using the Operator. This file is in the local directory from which the “confluent local start” command is issued to invoke that external Kafka cluster. The kafka.properties file has the following contents:


sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username=”test” password=”test123″;