Cassandra NoSQL database was designed to efficiently handle write operations. Consequently, there is always a trade-off between achievable read latency vs the amount of data that one can store per node. As the data per node increases, the write pipeline exerts more pressure on the system due to increased computational and IO demands, resulting in degradation of read latency. In addition, we have observed that a combination of other factors impact overall system performance. For example, the network latency (and throughput) has a direct impact on efficient scaling of Cassandra cluster. Similarly, the throughput of a Cassandra cluster is impacted by efficiency of access to flash memory(-ies) such as SATA SSD or NVMe drives. Also, compression engine performance has impact on read latency.
As with most big data problems, an approach that has been adopted by many users is that of adding more and more DRAM memory to every Cassandra node. Since one can only add a fixed amount of memory per node, one ends up with adding new nodes just to add more memory at the cluster level. Clearly, this approach works to a certain extent but only if one can afford adding large amount of DRAM on each node. Thus optimising system level performance requires a holistic approach.
rENIAC’s Data Engine is a technology platform to build acceleration solutions for data centric workloads, for example NoSQL databases, Search, etc., using FPGAs with commodity servers or cloud instances (e.g. AWS F1 instance). At the core of Data Engine are three main blocks. First, a data proxy engine that implements a complete transparent proxy for data workloads mentioned above. Second, a storage engine that implements support for dominant data models such as a columnar, document, graph, etc., along with interfaces to contemporary Flash memory technologies such as 3DXPoint, NVMe, and SATA SSD. Third, a network engine that supports most networking protocols such as TCP/IP, DHCP, ARP, ICMP, etc., to inter-operate with existing data workloads on Linux in a datacenter environment prone to failures.
rENIAC’s Data Proxy for Cassandra is built on top of the Data Engine. The main features of data proxy from an application perspective are:
- CQL compatible, requiring no changes to the application(s)
- Write-through cache; auto-managed data consistency
- Decouples read and write pipeline, providing read SLA’s in microseconds
- Scales to 100s of data proxy nodes (using gossip protocol)
Figure 1 shows the three server setup that we created to benchmark the performance of rENIAC’s Data Proxy. One server is used as a client machine that runs Cassandra Stress test. A second server is used as a Cassandra database server running Cassandra 3.x. supporting native protocol version 4. A third server is used to house the FPGA card that combined with the CPU acts as a rENIAC Data Proxy sitting between the client and database server. Two different runs of Cassandra Stress were carried out, one without the rENIAC Data Proxy in the setup to measure the baseline, and one with the rENIAC Data Proxy in the setup to measure the performance gains. All the three servers are connected via a 10Gb Ethernet switch.
The servers used in the above benchmark were all Supermicro 2U system with 2 x Intel Xeon E5-2650 V1 8 Core 2.6GHz, 64GB DDR3, 220GB SSD, an Intel 10GbE NIC (X520-DA2). The workloads involved Cassandra tables with 4KB of data per row in 10 columns. Cassandra stress was run with 1M read-only operations with a population of 100K rows.
The above graphs shows the measured results of the benchmarks that we ran. It is clear that there is a latency reduction of 10x or more for 99th and 99.9th percentile latencies at the client machine. We believe there is scope for further optimization.