Field Programmable Gate Arrays (FPGAs) as accelerators for data center workloads are beginning to cross the chasm of broader adoption. FPGAs have been around for more than twenty-five years and have successfully accelerated IO-centric applications such as network routers and storage controllers. FPGAs are reconfigurable hardware that offer software-like flexibility while delivering hardware-like performance using spatial computing techniques, leveraging parallel computational units with custom interconnections. Their use as accelerators for data center workloads continues to gain momentum. This trend is primarily driven by a recurring necessity of energy-efficient data center infrastructure and IO-centric workloads such as databases and compute intensive workloads such as inference for AI.
Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous master-less replication allowing low latency operations for all clients. (Source: Wikipedia)
Cassandra Read performance
Cassandra’s read performance can benefit from built-in caching. Read performance when using Cassandra gets decreased due to certain operations like compaction runs, consistency level, read repair chance, etc. Read performance can be improved by increasing the replication factor, but it can make the cache less efficient. If you want to decrease read latency, you can use a lower consistency level but it does that at the cost of consistency.
Due to the challenges relating to read performance in Cassandra, there is a need to improve caching and potentially reduce the number of nodes. rENIAC provides acceleration solutions for Cassandra databases in cloud platforms and works with vSphere. rENIAC’s Data Engine (rDE) is an FPGA-based accelerator for Cassandra that acts as a proxy and cache for Cassandra DB nodes. rENIAC’s Data Engine does all the heavy lifting required to add a flash-based cache acceleration to your Cassandra DB, without requiring application developers to modify application (e.g. continue using standard Cassandra query language API), manage cache invalidation, data population, or cluster management. Thus rDE frees up cycles to focus on developing much more efficient business logic while enabling significantly higher performance.
rENIAC Data Engine
rENIAC Data Engine (rDE) for Cassandra database is deployed as a FPGA-based database proxy. It sits between a Cassandra client and database, caching the data in storage that is directly accessible by the FPGA. It responds to queries by serving data either from its local storage or fetching it from the backend database when the data is not present in the local storage. This ensures that read requests are satisfied with predictably low latency and allows rDE to achieve throughput much higher than a standard Cassandra database cluster.rENIAC Data Engine has been designed to work without requiring any changes to the client code or the database, and with minimal configuration (as explained in the next section).
Figure 1: RDE Deployed as Data Proxy for Cassandra Database – Conceptual Architecture
The rDE nodes listen for incoming queries on the configured port. For read queries, the rDE parses the query and looks for the data in the local storage. If found, it returns the result to the client. If not found, it obtains the data from the database cluster, stores a copy in the local storage and returns the result to the client.
In the current version of the product, for insert, update and delete operations, the proxy forwards the query to the database cluster, invalidating the data stored in its own cache. When the database has successfully processed the query, the proxy forwards the response to the client.
Virtualizing rENIAC and Cassandra:
In this solution all components of the infrastructure are virtualized on vSphere. The rENIAC appliance was deployed on a vSphere host with a physical Intel FPGA card installed. The rENIAC virtual machine is configured with direct passthrough access to the FPGA.
The configuration of the solution is shown below. All components are virtualized and the rENIAC appliance is deployed as a proxy between the Cassandra database and its client.
Figure 2: Schematic of Virtualized rDS & Cassandra
OS and Software Requirements for FPGA Virtual Machine
rENIAC Data Engine has been tested to work with CentOS 7.4 or later, with a minimum Linux kernel version of 3.10. The Intel Programmable Acceleration Card the “Intel Arria 10” is the FPGA accelerator used in this solution.
Table 1: FPGA Virtual Machine Specifications
The following steps were used to deploy the rENIAC Data Engine.
- The components required to run rDE are installed.
- The virtual machine has the Intel Arria 10 FPGA card setup in passthrough mode
- The Cassandra database IP while configuring rDE.
- rDE is started by running the setup script. This script will flash the FPGA card and start the required software services.
- The network address used by the clients for the database server are changed to point to the FPGA card instead of the Cassandra database address.
The performance benefits offered by using rDE are measured by comparing the performance of running queries directly on the Cassandra database and running queries through rDE. Using the cassandra-stress tool, we see that the query throughput with rDE is approximately three times higher than the baseline. We also see that the latency values are in a much narrower range.
Tests were executed with the cassandra-stress utility. The baseline tests were performed directly against the Cassandra database and the performance were measured. The tests were then repeated with rENIAC Data Engine as a proxy layer. Transactions per second, mean latency, 95th percentile latency and 99th percentile latency values were captured.
Mean latency is the overall average latency that is experienced by your database queries. But mean latency values tend to hide outliers. Percentiles show the value at which a certain percentage of your data comes under. p99 latency means that 99% of the requests should be faster than given latency and similarly p95 latency means that 95% of the requests should be faster than given latency.
Figure 3: Latency and throughput for rDE compared to direct access to Cassandra
The results clearly show that 3-5X latency and 3.2X throughput improvements can be gained by leveraging rDE on vSphere.
Cassandra databases natively need major changes in infrastructure to improve read performance. A caching tier like that provided by rENIAC Data Engine (rDE) can help improve read performance drastically without any changes to the underlying Cassandra database infrastructure. In this solution, we leveraged vSphere infrastructure and its support for FPGAs to host rDE and all Cassandra components. Our tests have shown that rDE reduces average latency by more than 3x and tail latency by more than 5x, while also improving throughput by 3.2x. Running rDE on vSphere adds all the flexibility, agility and enterprise capabilities of the platform and can help improve read performance of Cassandra databases.