With the adoption of NVMe solid-state drives (SSDs), we now find that we are operating without adequate data protection – and we are having to make too many compromises to address this fact. In the days of spinning hard disk drives(HDDs), RAID technologies provided the needed protection. But RAID simply wasn’t designed for today’s ultra-fast solid state drives.
With that being said, it’s not surprising that SSD faults in the servers hosting data-hungry applications are often to blame for significant downtime or quality of service issues in the data center. This makes maintaining high-performance, high-reliability SSD-based storage systems challenging. Even with RAID and replication schemes in place, SSD failures cause recovery and repair overhead. In fact, all traditional RAID options come with big tradeoffs in terms of protection, performance, or cost. And things only get worse with the adoption of high-capacity SSDs.
Systems can be protected from SSD-related downtime without these tradeoffs using a new approach – one that borrows from the lessons learned from the use of GPUs to overcome CPU inefficiencies and accelerate performance.
Protection Comes Down to Performance
Data-intensive workloads supporting database and analytics applications increasingly require more compute and NVMe SSD storage resources. While CPU performance is increasing, it’s not keeping up, especially where accelerated performance is critical. Adding more infrastructure often proves to be cost-prohibitive and hard to manage. As a result, organizations are turning to solutions that free CPUs from computationally intensive storage tasks.
A new class of data processors emerged into data center architectures to address performance and storage management efficiency challenges that were once addressed by adding more CPUs.
These processors are now overcoming the limitations of using RAID technologies with SSD deployments – ushering in a renaissance of RAID. Data processors can optimize and shape data for RAID 5 style protection, providing infrastructure and database architects with a solution that delivers significant throughput performance, rapid rebuild duration, enhanced SSD endurance, and capacity expansion benefits.
Today’s modern data centers simultaneously support numerous diverse workloads on-premises and across public clouds, including databases, analytics, and other applications that need rapid and continuous access to their data. That’s why system architects must design and scale solutions that meet service level objectives (SLOs) and avoid service interruptions that would likely impact many users. The increasing demands of the data center make data processor-based solutions a lifeline to meet these expectations.
Are There Challenges with Using RAID with SSDs? Yes
Data centers are deploying hundreds of Exabytes of SSDs storage annually. Keeping databases, applications, and services running is difficult when every drive creates downtime risks. And while mitigating drive faults and failures is essential, so is storage that simultaneously delivers high levels of performance, reliability, and capacity. The inherent challenges of software RAID 0 (SWR0), software RAID 10 (SWR10), and hardware RAID 5 (HWR5), as shown in Figure 1, seem impossible to overcome when using SSDs.
|Architect Needs||Software RAID 0||Software RAID 10||Hardware RAID 5|
|Throughput Performance||Higher due to
|Lower due to data mirroring||Lower due to parity
|Rapid Rebuild Duration||Not applicable
(no data protection)
|Longer time, rebuilds
the entire drive
|Longer time, rebuilds
the entire drive
|Enhanced SSD Endurance||Impacts SSD useful life||Impacts SSD useful life||Impacts SSD useful life|
|Capacity Expansion||Uses the CPU for compression||Uses the CPU for compression||Uses the CPU for compression|
Figure 1: Comparison of Software and Hardware RAID
Is There An Ideal Solution for SSD RAID?
The ideal solution for SSD RAID borrows from the experiences of adding GPUs to servers to overcome CPU limitations. This simple upgrade enables modern innovations like artificial intelligence (AI) and new applications like self-driving vehicles, robotic manufacturing, facial recognition identification, cybersecurity, and fraud detection. None of these advances would occur without GPUs because the alternative of using CPUs only is not economically feasible.
New data processors optimize storage-related functions for modern non-volatile memory (e.g., high-performance storage class memory and high-capacity 3D NAND Flash). Continuing the shift of computing architecture from CPUs for all workloads to GPUs, CPUs, and DPs (data processors) for different workloads delivers remarkable outcomes. This includes eliminating storage-related performance bottlenecks to get the most from SSD investments.
By substituting a set of SSDs using software or hardware RAID with fewer SSDs using data processor-based RAID 5 (DPR5), organizations can experience better performance, faster rebuilds, enhanced SSD endurance, and higher capacity utilization, as shown in Figure 2. These advantages translate to tangible economic benefits for environments with data-intensive application workloads.
|Application Needs||Data Processor-Based RAID 5|
|Throughput Performance||Higher for small block size transfers|
|Rapid Rebuild Duration||Shorter having to only rebuild user data|
|Enhanced SSD Endurance||Turns all random writes to sequential|
|Capacity Expansion||Onboard compression, data shaping, and higher drive fill|
Figure 2: Benefits of Data Processor-based RAID 5
Data Processor-Based RAID: How It Works
Throughput Performance — RAID 5 algorithms take a toll on write performance, especially for small random writes. To change small amounts of data, parity updates require a read-modify-write operation that can severely impact write performance. By transforming all random writes to sequential, DPR5 eliminates this problem, accelerating throughput performance up to 12x compared to HWR5.
Rapid Rebuild Duration — If a drive fails in a traditional HWR5 array, the data is rebuilt from the parity data on the remaining drives. During rebuilds, there is a tradeoff between host I/O activities and the rebuild rate. Host I/O performance degrades significantly when the array is rebuilding, impacting QoS. With a hardware-accelerated DPR5 solution, rebuild performance can be up to 23x higher, with up to 5x faster rebuild times. This makes it possible to use high-capacity SSDs to keep up with data growth and avoid blast radius anxiety.
Enhanced SSD Endurance — SSDs have finite endurance, which is measured in the amount of data that can be written and erased before the device wears out. As the industry transitions from Triple Level Cell (TLC) SSDs to Quad Level Cell (QLC) SSDs and beyond, the level of endurance decreases. DPR5 solutions can shape the data for optimal placement on SSDs, eliminating wasteful write and read amplification and extending the useful life of SSDs up to 7x.
Capacity Expansion — DPR5 solutions can deliver a net increase in storage capacity while software and hardware RAID decrease storage capacity. Built-in data compression, higher RAID efficiencies, and near-full drive utilization can increase useable storage capacity up to 6x.
Where Do We Go From Here?
There is no doubt that modern workloads will be optimized across a mix of CPUs, GPUs, and data processors. Using data processors for storage helps organizations overcome inherent RAID limitations and facilitate greater efficiencies and scaling for the future growth of workloads.
Quality of service (QoS) is vital for data center workloads, whether they are running on-premises or in the cloud. The need for QoS makes the ability of data processor-based solutions to deliver consistent performance across normal operations, a failed drive, and rebuilding drive uniquely valuable. This capability dramatically simplifies data center design for architects who must tune for optimum business revenue and profitability by balancing capital expenses (CAPEX), operating expenses (OPEX), and service level agreements (SLAs).
The challenges with using RAID for SSDs will continue to worsen, especially as SSD capacity and performance increase, so now is the best time to try data processors to optimize storage. It’s simple to add a data processor card to a server (just plug in a PCI Express card). You won’t need to make any changes to hypervisors, operating systems, databases, and applications. Once everything is done, you will fully experience the performance, capacity, and economic benefits promised by SDDs.
About the author: Balaji Ramanuja is the Director of Product Management at Pliops, where he is focused on the company’s Extreme Data Processor (XDP) for cloud and enterprise data centers. Prior to Pliops, he was at VMware focused on enabling new workloads such as large databases (like SAP HANA and InterSystems Cache), Telco and Edge real-time applications on the vSphere Hypervisor Operating System (ESXi).
Samsung to Ship Next-Generation Smart SSD This Year
Harvard’s New Data Storage Is to Dye For, Avoids DNA Storage Pitfalls
The Next Breakthrough in Long-Term Data Storage is….Gold?