The AI-Defined Data Center

As data centers are re-imagined for cloud, there’s a universal need for a data management platform that can orchestrate data everywhere, across private and public clouds. Accordingly, data centers are evolving from an “infrastructure-centric” model to an “as-a-service” operations model, dynamically composing resources for each individual application. In the process, they are making increasingly intelligent tradeoffs between application/business needs and infrastructure capabilities.

DATERA

Datera converges standard servers with mixed storage media into a single data platform, from which its AI tailors storage and data management individually to each application. Datera’s data platform is architected from ground up to be operated as a service, and can continuously adapt to evolving business needs.

But what good would a smart data platform be if it couldn’t handle the data itself efficiently? So Datera is also architected from ground up for very high performance.

As a result, Datera fundamentally innovates along two key dimensions:

  • AI-defined: Driven by policies, continuously adapting to application needs.
  • Low-latency: Built for NVMe, persistent memory, and Intel Optane and Skylake.

This unique combination makes Datera the foundation for the AI-defined datacenter, and gives customers game-changing operational efficiency, infrastructure agility and economics, combined with enterprise-class performance.

We now show Datera’s impressive performance, and how Datera’s AI smartly delivers it. All benchmarks are 4k random read traffic (we’ll get to writes in another blog), driven by rack-local iSCSI. Latencies are measured at the boundary of the storage system (not including iSCSI itself).

SINGLE VOLUME

Let’s start with a single volume (and application queue depth of 1) to determine the minimal latency for each type of volume/media.

Hybrid Flash Nodes (3,100 IOPS/volume at 160us – minimum latency)

Starting with a simple 3-node hybrid flash cluster and one single volume (combining NVMe flash with HDDs), we get 3,100 IOPS/volume at 160us (served from the NVMe tier).

Add an All-Flash Node

Now, let’s add an all-flash node to the hybrid cluster – with one single click, we now have a data platform that combines hybrid flash nodes and an all-flash node.

Note that Datera’s AI isn’t yet moving the data itself, as the corresponding policy specifies an “economy” service level objective (SLO).

Flash-as-a-Service (2,500 IOPS/volume at 230us – minimum latency)

Now, let’s change our policy from “economy” to “performance,” and see how Datera’s AI adjusts the data platform to deliver 2,500 IOPS/volume at 230us (from the all-flash node).

Datera’s AI automatically orchestrates one copy onto the new all-flash node, and live-migrates the exports along with it, so that applications can get a different SLO without service disruption. (With more all-flash nodes, it could also place all copies on flash, again depending on the application SLO.)

Why did the performance on an all-flash node actually decrease over the hybrid node? Because the all-flash uses SATA SSDs, while the hybrid node uses faster NVMe flash for its performance tier.

Why all-flash nodes if their performance is lower than hybrid? Because all-flash performance is consistent and predictable, while hybrid flash performance can fluctuate, based on whether the data is on disk or in flash.

Now, why Datera? Because Datera’s AI uniquely places data on the storage nodes that best match the SLOs and economics of each individual application, it can spread copy data onto different nodes, based on desired failure behavior and economics, and it will seamlessly expand these concepts across private and public clouds.

Optane-as-a-Service (4,000 IOPS/volume at 70us – minimum latency)

Let’s further expand the price/performance band of our cluster by adding an Intel Optane node (one click). We already specified a “performance” SLO for our workload, so Datera’s AI automatically orchestrates one data copy and export from the all-flash node to the now better-fitting Optane node.

This marks the minimum media access latency of approximately 70us (4k reads). About 8us of that is Optane, about 40us is Datera’s software stack, and the remaining 20us latency is the overhead of the external iSCSI protocol flow (the rack-local iSCSI pipe itself would add approximately another 20-25us).

MANY VOLUMES

Now that we’ve determined minimal distributed volume/media latencies, let’s use a more realistic number of volumes (and an application queue depth of 32).

Optane-as-a-Service (230,000 IOPS/node at 90us latency)

Let’s start with a mixed cluster with three hybrid flash nodes and one all-flash node. With a “performance” SLO, Datera’s AI puts the primary copies on the all-flash node, and serves all reads from that node (barring datacenter topology considerations). We’re getting 132,000 IOPS/node at 260us latency from the all-flash node.

Now, let’s expand the price/performance band of our mixed cluster by adding an Intel Optane node (one click). With the “performance” SLO, Datera’s AI automatically copies the corresponding data from the all-flash node to the now better-fitting Optane node, and correspondingly moves the exports. As a result, the related application IOPS accelerate to 230,000 IOPS/node at 90us latency.

Flash-as-a-Service (132,000 IOPS/node at 260us latency)

Nodes may drop out anytime, and for any number of reasons. So let’s decommission the Intel Optane node, and keep the “performance” SLO for the related workloads.

As a result, Datera’s AI moves the corresponding exports from the Intel Optane node to the “next best” all-flash node. Correspondingly, workload performance decreases to 132,000 IOPS/node at 260us latency.

Basic Hybrid Cluster (160,000 IOPS/node at 190us latency)

Finally, let’s remove the all-flash node, reverting back to a homogeneous hybrid flash cluster, but still keep our “performance” SLO for the corresponding workloads.

As a result, Datera’s AI disperses the related exports across the hybrid cluster, while bringing the hot data from disk into the NVMe flash tier. During that time, performance can significantly fluctuate, with latencies up to multiple milliseconds. When the NVMe tier is fully heated up, the system settles at 160,000 IOPS/node at 190us latency.

SUMMARY

Perhaps simply think of Datera as the Tesla for the data center:

  • Tesla replaced the traditional combustion engine with superior technology. Datera replaced monolithic proprietary hardware with with intelligent software on commodity hardware. But in contrast to their modest electric or software-defined predecessors, both inspire with impressive performance.
  • Tesla’s and Datera’s raw speed is just the beginning – their AI creates a whole new experience. Tesla’s AI creates a self-driving car, and its simplicity will help transforming transportation. Datera’s AI creates a self-managing data platform that can orchestrate data everywhere, across private and public clouds, and its simplicity will help transforming how data centers are planned, procured, operated, serviced and scaled.

Welcome to the AI-defined data center. Welcome to the data era.

Please visit us at http://www.datera.io, or tweet me at @marcfleischmann.