Last year at VMworld we announced Project Pacific, a rearchitecture of VMware Cloud Foundation (VCF) to deeply integrate Kubernetes into the fabric of vSphere. Project Pacific quickly led to product releases closely integrated with our Tanzu portfolio – VMware Cloud Foundation with Tanzu and now vSphere with Tanzu – to provide a powerful and consistent experience for developers of both Kubernetes-orchestrated container-based and traditional virtualized apps. The result was a single platform for running VMs and containers, with a common operating model. This year at VMworld 2020, we are announcing a continuation of the rearchitecture started by Project Pacific, this time focused on hardware architecture. We call this effort Project Monterey.
To understand why we are embarking on this hardware architecture rethinking, let’s start by looking at how applications are changing. Project Pacific unleashed a torrent of new, modern apps that now run on VCF. These new applications are driving many different requirements of the underlying hardware infrastructure:
What we see is that these new apps are using more and more of server CPU cycles. Now traditionally, the industry has relied on the CPU for everything – application business logic, processing network packets, specialized work such as 3D modeling, and more. But as app requirements for compute have continued to grow, hardware accelerators including GPUs, FPGAs, specialized NICs have been developed for processing workloads that could be offloaded from the CPU. By leveraging these accelerators, organizations can improve performance for the offloaded activities and free up CPU cycles for core app processing work.
On top of all of this, security risks are continuing to proliferate, especially with applications that are distributed across many locations; shattering the traditional perimeter security model. Security now needs to be distributed broadly yet enforced locally. Doing that requires the proper hardware enforcement of those security policies and boundaries. The goal for many organizations is to get to a zero-trust security model. Rather than load up CPUs with this new security workload demand, offload accelerators can also help solve this new demand for distributed security.
These changes in workloads and security demands make the performance and efficiency benefits of hardware accelerators very attractive. However, these new hardware accelerators can create their own challenges: adding accelerators to every host increases CapEx while limiting accelerators to just some hosts increases OpEx. In the second model, operation complexity creeps up because IT operations teams need to figure out how to make sure the right apps are running in the right hosts with the right accelerators and that there is enough capacity to run the apps with adequate performance. There is massive operational overhead that make it harder for the applications to take advantage of these accelerators.
Without a material rearchitecture of infrastructure software, these hardware accelerator innovations lead to an unsustainable increase in both TCO and security risk. A better approach is required.
Introducing Project Monterey
We are introducing Project Monterey, a new technology preview, to solve these exact challenges. Project Monterey is a rearchitecture of VCF from the hardware up to support all the new requirements of modern applications enabled by Project Pacific. It leverages a new hardware technology called SmartNIC to deliver maximum performance, zero-trust security, and simplified operations to VCF deployments. More amazingly, by leveraging SmartNIC, Project Monterey extends VCF to support bare metal operating systems and applications! And of course, it delivers this across all the locations VCF runs today – data center, edge, and cloud – reducing TCO across the board. In order to realize Project Monterey, we are partnering with a broad set of SmartNIC vendors and server OEMs to deliver an integrated solution to customers.
Project Monterey is focused on delivering the following key advantages:
- Peak performance: by offloading network processing to SmartNIC, we can improve network bandwidth and reduce latency and free up core CPU cycles for top application performance.
- Unified, consistent operations: consistent operations across all apps – including those on bare metal OSes! This includes dramatically simplified lifecycle management across VCF deployments. All of which is designed to dramatically reduce OpEx.
- Zero-trust security model: by offloading network security functions to SmartNIC, we can provide comprehensive application security capabilities without compromising application performance.
What is a SmartNIC?
Let’s talk about what a SmartNIC is and how it enables all this amazing functionality.
Simply put, a SmartNIC is a NIC with a general-purpose CPU, out-of-band management, and virtualized device functionality. Let’s talk about each:
- General-purpose CPU: Having a general-purpose CPU allows one to run arbitrary code and applications directly on the NIC, such as networking and storage services, which both improves performance (because of fast access to the network I/O path) and saves core CPU cycles.
- Out-of-band management: The CPU complex on the SmartNIC can be managed independently from the server’s CPU, meaning that LCM can be independent and can give VCF a new control point for operations and management. As we’ll see, this is powerful!
- Virtualized device functionality: SmartNICs can expose “virtual” devices on the PCI bus that appear to the core CPU OS and apps as if they are actual hardware devices. This provides a level of software-driven hardware flexibility not available before.
We believe SmartNIC is truly a transformational technology that will drive an inflection point in hardware architecture and design.
Evolving VCF Architecture
Project Monterey is a redesign and rethinking of VCF to take advantage of these disruptive hardware capabilities. Fundamentally we are moving functionality that used to run on the core CPU complex to the SmartNIC CPU complex:
There’s a lot happening in this transformation, so let’s walk through the salient changes:
- ESXi on SmartNIC: Obviously the biggest change here is that we now have ESXi running on SmartNIC. Because most SmartNICs have an Arm-based processor, we have had to port ESXi to Arm!
- Two ESXi instances per physical server: There are now two ESXi instances running simultaneously, one on the main x86 CPU and one on the SmartNIC. These two ESXi instances can be managed separately or as a single logical instance. CSPs providing VCF-as-a-service will want the former while enterprises using VCF as normal will prefer the latter.
- Storage and network services: These now run on the SmartNIC. As explained above, this both improves storage and network I/O performance and reduces pressure on the core CPU, leaving more cycles for the apps.
- Host management: The SmartNIC ESXi will now manage the x86 ESXi. This allows us to improve LCM and other functionality completely transparently to users and customers.
- Security airgap: While our virtualization layer already provides strong security isolation between the applications and underlying hypervisor, having an ESXi instance on the SmartNIC provides greater defense-in-depth. Even if the x86 ESXi is somehow compromised, the SmartNIC ESXi can still enforce proper network security and other security policies.
- Bare metal OS support: Because the SmartNIC ESXi can manage the x86 OS, it can deploy Linux or Windows just as easily as it can deploy ESXi. This is the mechanism through which VCF can now manage bare metal OSes. In addition, VCF can deliver storage and networking services to that bare metal OS because we are bringing the full complement of VCF services to SmartNIC!
Rethinking Cluster Architecture
Project Monterey enables more than just single host benefits. It also enables us to rethink cluster architecture and to make clusters more dynamic, more API-driven, and more optimized to application needs. We enable this through hardware composability. Not only can SmartNIC expose virtualized devices to its local host, it can also expose those virtual devices to remote hosts.
Imagine a four-host cluster, where two hosts without accelerators have applications running on them while two other hosts have accelerators (FPGAs in this case). Normally, those two applications would not have access to the FPGAs because there are none on the local hosts. Admins could try to predict future app needs and build servers with the right balance of FPGAs, but this is obviously cumbersome and error prone.
Project Monterey allows a much simpler solution: just expose those hardware accelerators to all the hosts in the cluster, allowing all apps in the cluster to take advantage of those accelerators. This works both for ESXi and bare metal OSes. The assignment of accelerator to app is completely API-driven and can be specified as part of the Kubernetes manifest if using VCF with Tanzu. This will enable a complete rethinking of cluster design enabling most organizations to move seamlessly toward rack-scale architecture!
When talking to customers about the benefits of Project Monterey, three key use cases came out very clearly:
- Network performance and security: many customers are running into scaling, performance, and other challenges. By offloading network and security functions to the SmartNIC, we can achieve line rate performance with no core CPU overhead and deliver a distributed firewall with L4-7 security with no network performance impact!
- Cloud-scale storage and disaggregation: we’ve heard from customers that they want more flexibility and simplicity in delivering storage functionality. Project Monterey can enable storage function acceleration such as compression, encryption, and erasure encoding without impacting performance. It also is designed to deliver dynamic storage profiles (for iops and capacity) and remote storage access on-demand.
- Bare metal and composability: Bare metal is an exciting feature for enterprise and CSP customers who want to simplify operations by enabling bare metal-as-a-service with vSphere and delivering VCF networking and storage services to bare metal workloads. They are also looking for the additional operational simplicity in composability enabling rack-scale architecture.
Customers have told us clearly that they’re looking to implement these use cases across their hybrid cloud – data center, telco clouds, and edge. Because VCF already runs in all these locations, expanding VCF with Project Monterey enables all these locations to take advantage of all these benefits.
We are working closely with our hardware partners to successfully design and realize Project Monterey. First, we’re excited to announce our partnership with three of the top SmartNIC vendors in the industry: NVIDIA, Pensando, and Intel. Each brings a unique set of capabilities to the SmartNIC market and we plan to integrate each partners’ SmartNIC technology deeply into VCF.
Second, we’re also partnering with server OEMs to enable us to deliver a complete end-to-end, integrated solution for customers. We’re happy to announce that those server OEMs are Dell Technologies, HPE, and Lenovo.
We are committed to delivering a broad and deep ecosystem enabling the greatest breadth of choice for customers.
The Journey Starts Now
We’re excited about Project Monterey and the foundational improvements it can provide to your hybrid cloud architecture. There are a variety of ways you can get started preparing for this powerful technology.
First, if you don’t have VMware Cloud Foundation, start taking advantage of it today! It is the foundation on which Project Monterey is built and we will enable turnkey support for it with VCF.
Second, leverage Tanzu to kickstart your app modernization journey. As mentioned at the beginning, VCF with Tanzu is a rethinking of VCF with Kubernetes deeply integrated, enabling the easiest path for both developers and operators to modernize their applications and operations.
Third, check out all the sessions and content on Project Monterey at VMworld. There’s a ton of great, in-depth material from both VMware and our partners:
Good luck on your journey redefining your hybrid cloud architecture!