Next-gen VMware Architecture with SmartNICs – Are You On Board?

The revolution is here

Save to My DOJO

Next-gen VMware Architecture with SmartNICs – Are You On Board?

Server hardware, virtualization technologies, and modern data center infrastructure are transforming. As a result, modern-day applications are no longer the conventional three-tier applications. Instead, they are highly distributed and contain many containerized microservices. The shift to modern, containerized applications has created new and unique infrastructure challenges in the data center. Next-generation architecture with SmartNICs is helping to solve the challenges of next-generation applications. In addition, VMware’s Project Monterey, which was announced during VMworld 2020, is poised to help companies take advantage of the new SmartNIC architecture and disaggregated infrastructure.

The Challenges of Hybrid Infrastructure and Modern Applications

The traditional approach to server infrastructure with the central processing unit (CPU) provides a core processing “brain,” controlling all the server’s processing capabilities and allowing it to be used for a variety of use cases. These use cases include general-purpose servers, network or security appliances, storage appliances, etc.

The traditional server was also built for monolithic workloads, not distributed applications or workloads. So, no matter the workload on a general-purpose server, the capabilities, architecture, scaling, and other characteristics remain the same.

Modern applications present new and unique challenges to organizations designing and using traditional infrastructure. For example, modern distributed applications involve unstructured data, including images, log files, and text. Standard central processing units (CPUs) found in conventional servers are not well suited for these shifts in applications and the highly distributed nature of these new workload types. As a result, much of the CPU compute processing power is relegated to infrastructure services instead of applications.

Traditional server infrastructure with a standard CPU runs and processes multiple types of payloads in the same processing unit, which is less than ideal. These include:

  • Management payloads – These are core-critical processes allowing the management and control of the infrastructure. These generally run between embedded and userspace payloads
  • Userspace payloads – These are applications and the data they rely on and require
  • Embedded payloads – These payload types are generally included in the core operating system and can run privileged operations in the operating system kernel
  • Organizations attempting to satisfy the demands of new distributed microservices have created new infrastructure silos. It has also led to inconsistencies in how businesses manage and operationalize their infrastructure. What are some of the challenges with traditional infrastructure services with new modern microservices application architecture?
  • Artificial intelligence and machine learning (AI/ML) – Organizations today are using AI/ML to process the mass of data collected from IoT and other devices. The AI/ML compute infrastructure used by organizations today is often specialized and unique assets used as separate infrastructure resources in the data center. It results in increased complexity and cost for the organizations using them.
  • Complex server scale-out costs – Scalability is an increasing challenge with traditional server infrastructure. As businesses need to scale data center clusters, including CPU and network-based scaling, adding traditional nodes becomes more complex and inefficient. Data processing units like SmartNICs allow scaling infrastructure very granularly by adding the data processing units into the environment to handle specific use cases. In addition, as modern apps become more highly distributed and rely on modern infrastructure technologies as the underpinning for hybrid technologies, a larger percentage of the server capacity is used for infrastructure services and technologies. As a result, it is increasingly difficult to project capacity and scale-out costs for additional capacity.
  • Increasing security concerns – As was shown by the Spectre and Meltdown vulnerabilities, cybersecurity concerns can exist at the CPU hardware layer. It becomes increasingly risky to run infrastructure and application services on the same CPU. The more isolation provided for each application stack layer, the more secure the application data is from current and future threats. Today’s security requirements are increasingly stringent as cybersecurity risks continue to grow, and there is a need for zero-trust separation of workloads, management, and applications. Due to cloud or virtualized environments, it is crucial to have intrinsic platform security. With the way traditional servers are designed, the entire server is one unit, with all the hardware components required to process data and run applications. This tightly coupled hardware unit can make it challenging to maintain platform security and meet other requirements, such as scalability and lifecycle management. It leads to an entire server or set of servers needing to be replaced simultaneously to deliver hardware security upgrades.

New technologies and redesigned hardware and software isolations are needed to satisfy the needs of highly distributed modern applications. However, with the contemporary developments using data processing units such as SmartNICs, this goal is now achievable.

With the variety of processing tasks required in today’s highly distributed processing environments, data processing units like SmartNICs can offload many tasks from the central processing unit (CPU). These offloaded processing tasks help to improve the processing capabilities and efficiencies required by today’s modern applications.

The IDC also refers to these data processing units as function-offload accelerators (FAs). It is worth noting that many prominent service providers are adopting the use of DPUs or FAs with traditional platform architecture. With the adoption of DPUs and FAs into the enterprise data center, we will see a paradigm shift in how infrastructure is delivered and software is composed. This shift is helping to facilitate the decentralization of services with the trend in shifting to microservices. It is also helping to drive the disaggregation of hardware in the data center.

Benefits of Shifting Infrastructure Services to SmartNICs

The challenges we have noted so far help highlight the physical infrastructure’s role in the transition to scalable, secure, and efficient modern applications. The disaggregated and decentralized approach using data processing units like SmartNICs is becoming very attractive to organizations looking to transition to modern apps running across disaggregated environments.

The SmartNIC data processing units allow solving many challenges associated with traditional servers with CPUs. These data processors allow offloading of specific workloads and provide the separation of processing tasks. The benefits provided by the modern data center architecture utilizing DPUs such as SmartNICs include:

  • Freeing CPU and memory used for infrastructure tasks – Once infrastructure tasks and processing are offloaded from the CPU, it frees up CPU cycles for business-critical applications. Organizations no longer have to balance resources between critical applications and the equally critical infrastructure services that are needed to run the applications
  • Standalone control plane – By running infrastructure services on SmartNICs or data processing units, it provides a standalone control plane for access control and infrastructure services. This standalone and separated control plane offers many benefits from a security and operational perspective.
  • Secure, Zero-trust computing – There are tremendous security benefits in separating infrastructure services from applications. It allows the operating system and any virtualization platform to gain the benefits of an additional layer of protection against rogue and malicious exploits and code

What is a SmartNIC?

First of all, what exactly is a SmartNIC? Intuitively, a SmartNIC is an enhanced network interface card (NIC) that serves as its only data processing unit, allowing it to become its own standalone intelligent processing unit to process data center networking, security, and storage.

New generations of discrete data processing units (DPUs), including SmartNICs, GPUs (graphics processing units), and FPGAs (field-programmable gate arrays), are being increasingly used to perform specific applications processing. For years now, we have seen the increasing use of graphics processing units (GPUs) for a vast number of use cases, including accelerated graphics offloading, but also their use in specific processing use cases such as artificial intelligence (AI) and machine learning (ML). This trend in GPUs and other discrete “smart” processing units helps show the industry’s direction regarding how tomorrow’s mass of data is processed.

Foundational NICs (traditional network interface cards) have been used to interconnect multiple computers in traditional computer networks for years in Ethernet networks. Networking has remained a critical component of the modern data center. However, software-defined networking (SDN) is one of the major consumers of the compute cycles in the data center. It is a growing trend as more modern applications and technologies use software-defined networking overlays. However, in addition to SDN technologies, many other modern capabilities and features brought about by virtualization and modern microservice architectures are taxing even current CPUs with additional processing demands, robbing cycles from business-critical applications.

Devices based on the SmartNIC architecture are being developed by a wide range of companies with different approaches to their implementation. These include being implemented by the following technologies:

  • FPGAs – have good flexibility but are difficult to program, expensive, and are not as performant as dedicated ASICs.
  • Dedicated ASICs – Dedicated ASICs provide the best performance and are also flexible to program and produce.
  • System-on-chip (SoC) – The SoC designs blend dedicated ASICs with programmable chips. The SoC designs offer the best of both worlds, including maximum performance and flexibility in programming. However, they are the most expensive.

An example of modern SmartNIC technology is NVIDIA ConnectX-7 400G SmartNIC. It is designed to deliver accelerated networking for cloud-native workloads, artificial intelligence, and traditional workloads. In addition, it offers software-defined, hardware-accelerated storage, networking, and security capabilities to help modernize current and future IT enterprise data center infrastructure.

NVIDIA ConnectX-7 SmartNIC (image courtesy of NVIDIA)
NVIDIA ConnectX-7 SmartNIC (image courtesy of NVIDIA)

It provides 400Gb/s bandwidth, accelerated switching and packet processing, advanced RoCE NVIDIA GPUDirect storage, and in-line hardware acceleration for TLS/IPsec/MACsec encryption/decryption.

Note these additional features:

    • Accelerated software-defined networking with line-rate performance with no CPU penalty
    • Enhanced storage performance and data access with RoCE and GPUDirect Storage and NVME-oF over RoCE and TCP
    • Enhanced security with hardware-based security engines to offload encryption/decryption processing of TLS, IPsec, and MACsec
    • Accurate, hardware-based time synchronization for applications in the data center

In addition to NVIDIA, Intel is also producing SmartNICs and SmartNIC platforms. Intel refers to their solution as Intel Processing Units (IPUs) for specific infrastructure applications and SmartNICs. The infrastructure processing units accelerate network infrastructure and help free up CPU cores for improved application performance.

An example of the Intel IPU SmartNIC is the Intel IPU C5000X-PL Platform card. It provides a high-performance cloud infrastructure acceleration platform with 2×25 GbE network interfaces and can support cloud infrastructure workloads such as Open vSwitch, NVMe over Fabrics, and RDMA over Converged Ethernet v2 (RoCEv2).

Intel IPU C5000X-PL
Intel IPU C5000X-PL (Image courtesy of Intel)

The Intel IPU Platform, Codenamed Oak Springs Canyon, is the next-generation high-performance cloud infrastructure acceleration platform that provides 2x100GbE network interfaces and supports the above workloads, including Open vSwitch.

The Intel FPGA SmartNIC N6000PL is an example of Intel’s high-performance Intel Agilex FPGA-based SmartNIC providing 2x100GbE connectivity and supports many programmable functions, including acceleration of Network Function Virtualization (NFVi) and virtualized radio access network (vRAN) for 4G/5G deployments.

The Silicom FPGA SmartNIC N5010 provides the first hardware programmable 4x100GE FPGA-accelerated SmartNIC enabling servers to meet the performance needs of next-generation firewall solutions.

Silicom FPGA SmartNIC N5010
Silicom FPGA SmartNIC N5010 (Image courtesy of Intel)

VMware on SmartNICs Accelerates Virtualization

As we have detailed, the shift to modern applications is leading to a change in how organizations will be able to provide infrastructure to meet the requirements of the enterprise data center. More processing and compute cycles are spent on infrastructure services needed to connect the hybrid data center across many verticals.

In addition, new security challenges continue to mount as cybersecurity risks continue to grow, and the boundaries of the enterprise data center have been blurred with the integration of many cloud technologies and solutions. Businesses need a consistent operating model that unifies traditional and modern apps and better computing resource utilization for workloads without increasing infrastructure costs and security that provides robust isolation between infrastructure services and workloads.

What if you could run VMware, not in the traditional way, but rather on a SmartNIC where the ESXi hypervisor is isolated from the applications? Announced at VMworld 2022, Project Monterey is a new solution to meet the modern challenges facing businesses today, pivoting to modern applications running in distributed environments in the hybrid cloud. What is it?

VMware Project Monterey Unveiled with ESXi on SmartNIC

Project Monterey from VMware reimagines infrastructure as a distributed architecture where data processing units (DPUs) form the backbone of infrastructure management and services, including networking, security, storage, and host management services. Instead of running the ESXi hypervisor, storage services, and networking on top of traditional server infrastructure, organizations use data processing units (DPUs).

It brings many benefits to managing and operationalizing infrastructure and infrastructure services:

    • Unifies workload management across traditional, cloud-native, and bare metal operations, reducing operational cost
    • Provides composable software-defined infrastructure to future proof investments
    • Improves performance by accelerating network storage and security services on the DPU, freeing up CPU cycles to achieve better workload consolidation at a lower total cost of ownership (TCO)
    • Enhanced zero-trust security with air-gapped isolation between tenants and workloads. It includes an enterprise-wide security policy that applies uniformly across existing and modern apps
    • It allows IT admins to take advantage of skills and tools used in the enterprise for years now with the vSphere ecosystem

VMware Grants Security Into SmartNICs

As mentioned, software-defined networking and other infrastructure services are significant consumers of CPU and memory resources in traditional servers. In conjunction with Vmware Project Monterey, VMware has also announced that it plans to run distributed firewalls through the NSX-T Data Center Services-Defined Firewall directly on top of SmartNICs.

It would effectively offload the compute and memory requirements of software-defined networking resources from the traditional CPU and run these on the SmartNIC data processing units. When you consider that many of the infrastructure virtual machines resource requirements are not insignificant, it helps to underscore the benefits of the transition to the SmartNIC architecture in the data center.

The Future of VMware and SmartNICs

It is clear to see the direction VMware is taking with the introduction of support with Project Monterey and SmartNICs. With the massive shift to disaggregated applications and workloads, the traditional infrastructure model becomes less relevant and more inefficient. In addition, the popularity and use of GPUs in recent years for offloading CPU-intensive tasks for AI/ML workloads help show the benefits of these special-purpose data processing units or co-processors.

VMware is helping to provide the solution for organizations meeting the challenges of traditional infrastructure using microservice application architectures. With Project Monterey, businesses can embrace the use of SmartNICs. By offloading the infrastructure services processing and resources to SmartNICs, companies can free up the CPU for the all-important task of providing processing for business-critical applications.

Undoubtedly, the future of VMware and SmartNICs will continue to develop and help solve challenges related to the new “server sprawl” coming from the virtualization movement. It will help address the resource consumption coming from infrastructure virtual machines and other management VMs running simply to process infrastructure-related traffic such as software-defined networking.

If you think about it, the traditional management cluster in the VMware vSphere world may now be possible instead by using SmartNICs to run your critical infrastructure services and scale these by simply adding additional DPUs. It will be interesting to see if the future of management clusters will now reside in the new Project Monterey vSphere clusters. Decentralized disaggregated infrastructure services are the future for VMware. It will be exposed using the familiar VMware management and operational tools. How so?

VMware’s Familiar Tools and Operations

Despite major changes underneath, VMware has done a great job keeping the management and operational tools the same for VMware vSphere and related services. As a result, you can look at solutions like VMware vSphere 7.0 with VMware Tanzu baked in, a.k.a vSphere with Tanzu. VMware has added all the features and functionality to the existing VMware vSphere Client. With vSphere with Tanzu, VI admins can now run modern applications inside Kubernetes-controlled containers and do that right beside traditional virtual machines that VMware has run for years.

One of the strengths of the VMware vSphere solution is the management platform with vCenter that hasn’t significantly changed for IT admins despite the introduction of new features and capabilities. It is arguably one of the reasons for the platform’s success, providing stability and consistency that admins need for Day 0, 1, and 2 operations.

With VMware’s Project Monterey, VMware will undoubtedly keep implementing the new features and changes in the underlying capabilities of vSphere running on top of SmartNIC data processing units. This change will allow operationalizing modern operations with disaggregated hardware using vSphere.

To properly protect your VMware environment, use Altaro VM Backup to securely backup and replicate your virtual  machines. We work hard perpetually to give our customers confidence in their VMware backup strategy.   To keep up to date with the latest VMware best practices, become a member of the VMware DOJO now (it’s free).

What Does it all Mean?

Modern applications are shifting from monolithic applications. They have transitioned from running on traditional servers in the data center to microservices architectures. With this shift in application architecture, organizations are seeing challenges using conventional physical server configurations. As microservice architectures are adopted, businesses face challenges with scalability, security, and running modern applications like AI/ML.

SmartNICs will undoubtedly change the infrastructure services landscape in the modern data center by allowing organizations to scale infrastructure services that are not possible when using traditional server technologies with a standard central processing unit (CPU).

VMware’s Project Monterey will help organizations using VMware vSphere to take advantage of these new SmartNIC data processing units. In addition, it will help modernize the approach to infrastructure services and free up the CPU in traditional server architecture for processing business-critical applications they were intended to run. It will be interesting to see the future data center infrastructure and how DPUs, including SmartNICs, will help transform the enterprise and cloud data center landscape.

Altaro VM Backup
Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

Frequently Asked Questions

 SmartNICs are special data processing units with their own processor used to offload specific and special-purpose tasks. In the case of SmartNICs, these can be used to offload infrastructure services. The SmartNIC's processor allows it to have its own "brain" outside the central processing unit (CPU).
 The Intel SmartNICs allow offloading processing for specific applications to the data processing unit capabilities provided by the Intel SmartNIC. Like NVIDIA and others, Intel is on the list of preferred partners in the partner ecosystem for deploying SmartNICs and VMware vSphere.
DPU stands for data processing unit and is a category of co-processors that include the likes of SmartNICs, FPGA-based, and System-on-Chip solutions. These specialized devices, such as SmartNICs, allow the offloading of infrastructure services and other critical infrastructure processing on these special co-processors

Leave a comment

Your email address will not be published.