Performance of Apache Spark on Kubernetes versus Spark Standalone with a Machine Learning Workload

Apache Spark is a very popular application platform for scalable, parallel computation that can be configured to run either in standalone form, using its own Cluster Manager, or within a Hadoop/YARN context. A growing interest now is in the combination of Spark with Kubernetes, the latter acting as a job scheduler and resource manager, and replacing the traditional YARN resource manager mechanism that has been used up to now to control Spark’s execution within Hadoop. This new blog article focuses on the Spark with Kubernetes combination to characterize its performance for machine learning workloads.

The Spark core Java processes (Driver, Worker, Executor) can run either in containers or as non-containerized operating system processes. All of the above have been shown to execute well on VMware vSphere, whether under the control of Kubernetes or not. Kubernetes here plays the role of the pluggable Cluster Manager.

This recent performance testing work, done by Dave Jaffe, Staff Engineer on the Performance Engineering team at VMware, shows a comparison of Spark cluster performance under load when executing under Kubernetes control versus Spark executing outside of Kubernetes control. Without Kubernetes present, standalone Spark uses the built-in cluster manager in Apache Spark. The Kubernetes platform used here was provided by Essential PKS from VMware. The full technical details are given in this paper.

A well-known machine learning workload, ResNet50, was used to drive load through the Spark platform in both deployment cases. The BigDL framework from Intel was used to drive this workload.The results of the performance tests show that the difference between the two forms of deploying Spark is minimal.

Spark deployed with Kubernetes, Spark standalone and Spark within Hadoop are all viable application platforms to deploy on VMware vSphere, as has been shown in this and previous performance studies. They each have their own characteristics and the industry is innovating mainly in the Spark with Kubernetes area at this time.

Search Engine Optimization (SEO) is a crucial strategy for enhancing a website‘s visibility in organic search results, driving traffic, and improving online presence. By optimizing content and keywords, businesses can attract targeted audiences, improve brand credibility, and enhance user experience. SEO offers long-term benefits by continuously drawing organic traffic without the need for ongoing ad… […]

Optimizing the speed of a WordPress website is essential for ensuring a superior user experience and achieving higher rankings in search engine results. Slow websites not only deter visitors but also impact your site’s SEO negatively. Fortunately, there are various strategies you can employ to enhance your website‘s performance. This comprehensive guide outlines ten simple… […]

Performance of Apache Spark on Kubernetes versus Spark Standalone with a Machine Learning Workload

It’s Easier Than Ever to Tell VMware What You Think

Tenanted incident management in vCloud Director 9.x powered by ServiceNow

Introducing the Cloud Director Plugin Development Environment

Feature Friday episode 3 – VMware Cloud Director Container Service Extension

Feature Friday Episode 38 – Plan your Cloud Director service opportunity

BEST Web Hosting

TECHNOBABBLE

How to build a website with WordPress and what are the best plugins to use

Drupal Hosting

© Copyright

Related Posts

© Copyright