At the OpenStack Summit in Boston last month, Datera presented the Elastic Data Fabric. It was a great event and is a great community.
One of the most-asked questions at our booth was how Datera compares to Ceph storage. It’s a good question that deserves a more complete answer than could be given in a show setting, so in Part One of a two-part blog series, we’ll first compare the designs of Ceph and Datera and in Part Two we’ll look at performance.
First, both solutions are considered software-defined storage, and both leverage commodity servers that consist of processors, network and media. A commodity server can be competitive with purpose-built storage platforms. And the ability to leverage the benefits of commodity servers to build a large scale storage system is undeniably the future.
Next, both systems are based on the “scale-out” principle, with the central idea being to add capacity and performance to the system where additional nodes are joined. Using this principle, systems can grow very large in both of these dimensions. (Note: This is in contrast to traditional “scale-up” arrays where all the performance is purchased up front in the form of a controller or a pair of controllers and only capacity is added over time.)
However, when we look under the covers the differences between Datera and Ceph storage become clear:
Datera has developed a block storage file system from the ground up, as delivering both high performance and low latency requires great care across the entire I/O stack. Our Block Store is log structured such that all data writes are sequential, delivering high performance for HDDs and mitigating wear on SSDs. This approach also offers the benefit of low latency.
Ceph storage is designed around a Reliable Autonomic Distributed Object Store (RADOS). RADOS uses existing Posix file systems like XFS and EXT4. These file systems are designed for a very different use case than serving as a block storage back end. While this could change in time, this type of development takes years of hard work to complete.
One of the main features of Ceph storage is the CRUSH map. The idea behind CRUSH is to use the power of the CPU to compute the placement of data vs. storing block placement metadata in a table or metadata store. The CRUSH algorithm is extremely powerful but is complex to manage.
Datera employs a concept called “targeted placement” (see Figure 1) in which data is effectively routed to the correct node and the node manages local metadata placement. With targeted placement, the amount of metadata generated at the cluster level is very low, while the heavy lifting of managing block-to-media placement is done on each node independently. As a result, Datera can handle large scale deployments with ease and workloads can be targeted to All-Flash or Hybrid nodes as needed.
Figure 1 – Targeted Placement
Both Datera and Ceph storage support enterprise features such as snapshots and clones. Datera has QoS capabilities and Ceph storage supports block, object, and file. For block, Datera uses industry-standard iSCSI while Ceph storage uses a proprietary protocol.
In Part Two, we’ll examine the performance differences between Datera and Ceph storage, which can be more than 10X using the same hardware.