Approaching the Billion IOP Datacenter

Traditional storage systems are increasingly being pushed beyond their intended design boundaries. Users have always been asking for more than just raw storage capacity, but now they are demanding multiple instances of terabytes, sub-microsecond latencies and hundreds of thousands of IOPs per deployment, across multiple coexisting workloads.

White papers and blogs are skillfully articulating these needs, and busily predicting the emergence of the next generation datacenter, all based on software-defined everything to deliver seamless scalability. Buzz-wordy descriptions of new products try to capture our imagination with sweet promises to solve our `cloud problem` with software-defined, container-based, microservice-delivered systems on commodity hardware.  However, there is a major problem that no one seems to be recognizing.

The need to approach the billion IOP datacenter.

We have settled into a comfortable industry cadence from kilobytes to megabytes, gigabytes, terabytes and petabytes of storage, and from kilobit, megabit to multi-gigabit of bandwidth.

But no one has really architected a system that can easily go from – let’s call it – kilo-iops, mega-iops and now, giga-iops.

Why do we need a system that delivers such capabilities? Because the industry has a cloud-problem. It is under ever increasing pressure to save cost and time: fewer people with fewer resources that need to design, write, test, deploy, and scale up or down more applications at ever increasing speeds. All these applications need to peacefully co-exist without impeding each other, while being deployed in a myriad of frameworks (VMware, OpenStack, Docker, etc.) across a multitude of platforms (bare-metal, containers and VMs). These diverse applications, frameworks and platforms fragment the datacenter, and the resulting silos prevent coherent operations and efficient economics. At the same time, the success of the public cloud (Amazon Web Services, Azure and Google) is challenging traditional datacenter architecture at its very core. The only way out of this is a universal data infrastructure that can help consolidating the mess.

We’re addressing this challenge with the following key elements:

  1. An elastic data & control plane
  2. API-based operational model
  3. Standards-based protocols
  4. The power of NVDIMMs/NVRAM/NVMe (and soon 3D XPoint)

 

  • Elastic Data & Control Plane

How do we get an elastic control and data plane that can attach storage resources to many tens, hundreds, or even thousands of applications?

First, we have created floating iSCSI initiator/target relationships that allow applications and their storage to seamlessly move across storage endpoints. As a result, we have dissolved topological rigidity. As an application moves across racks, we can drag along its storage and manifest its endpoints in just the right rack. We spread out all of the IOPs, so migrating apps can be served from many locations – simultaneously.

Second, we created an operational model that allows to describe applications in terms of their service needs, incl. resiliency, performance, affinity etc. So during application deployment, storage doesn’t need to be handcrafted as LUNs on pre-determined arrays with pre-set RAID levels and all those other legacy attributes (which is cumbersome and inflexible). Instead, with us, every volume has characteristics that are fluid from build up to tear down.

Third, we don’t ask deployment teams to spend their valuable time mapping out elements of their storage system. Instead, we consolidate and deliver everything in one true shared-nothing-figure-nothing-out-other-than-installing architecture.

  • API-Based Operations Model

Developers want their own resources, and they want them easier than falling off the floor. With Datera, they can deploy storage without worrying about details. They simply describe their application needs (service levels) and roles (development, testing, production or QA, etc.), and then they can lean back and let Datera do all the hard work. No need to deal with LUN masking, provisioning, authentication hassles, ACLs, figuring out what ports have access to what storage, etc.  No fuss, no muss.  

  • Standards-Based Protocols

So Datera is easy to use and scale, and provides multi-tenant storage for bare metal, containers, VMs, etc. But does it support every OS? And where and when will drivers be available for all those flavors of Linux, Windows or even BSD?

No worries. As long as the OS supports iSCSI, the lingua franca of block storage, Datera supports it (in fact, Datera contributed the whole block storage subsystem to Linux, incl. iSCSI, Fibre Channel and a dozen more storage protocols). No more hassle with proprietary drivers, and in particular no more client-side proxies. Who wants to track down a hundred instances of a driver?

  • The Power of NVDIMM/NVRAM/NVMe

Now, with all of this, how do we reach our vaunted gigiops? This level of performance is useless without resolving the three delivery challenges above (precise, automatic and standards-based).

Once we can capture a wide spectrum of applications via their intent, accommodate a few hundred IOPS to millions of IOPs in one single storage cluster, and have a control plane that makes configuration and re-configuration a breeze, then we can add one last key ingredient to the mix: immensely powerful NVDIMM and NVMe storage media that gets us to our gigaiops. Datera can pool and auto-tier this storage media and scale it across the datacenter, delivering high performance and low latency, automatically, and through standard protocols, to any application on any platform. And if you think what we can do with NVMe is impressive, just wait what we will soon do with 3D XPoint.

Datera is the easy button for your datacenter.

We finally made it to the billion IOP datacenter
We started with hundred IOP disks
Then larger IOP SSDs
Then even larger NVMe
But no one learned to scale like we did
Only we could figure out the shared nothing, scale out architecture that others claim, but ‘they’ made the mistake of proprietary drivers, we’re using standard iSCSI – even supported by Cinder.
We didn’t make the mistake of trying to figure out where to put the control plane, we figured out how to scale and distribute it.

How

  • Central control plane
  • No proprietary driver
  • True auto-tiering
  • Support for iSCSI and iSER (RDMA)
  • At 4k Random read, we can give 150,000 IOPs per machine at 600MB/s – you’d need more than 6,000 machines for this
  • Our all flash array can give you 500,000 IOPs – you’d need 2,000 machines to do this. Assuming you fill your racks about 35U, this can be done in 57 racks.

What

  • Persistent container storage
  • Template based application deployment for VMs, bare metal applications
  • Provisioning storage isn’t hard, standing up a large cluster is, and no one makes it more simple than Datera

Who

  • Datera, the creator of Application-Driven Cloud Data Infrastructure

Why

  • Cloud carving for multiple, few hundred IOP applications all the way up to multi-million IOP big data jobs
  • Hosting providers can get the costs of standard hardware and scale their customers’ needs without the operational pain of figuring out placement of storage, it’s one control plane, with one (1) API to deploy.

Conclusion

  • When you are looking to save your money and `come home` from AWS, or even mirror your current AWS Deployment, we’ve figured out the way to provide large, elastic, inexpensive and scalable storage.  Others will tell you they did too, but go figure out their limits
  • Many folks may have the right parts, Think about having the parts of a car laying out in the garage. Everyone says they have the parts, but only we know how to build the hot and fast sports car out of it.