A Swarm Approach to Data for the Cloud Era

Before starting Datera in 2013, a small group of us contributed the block storage subsystem to the Linux (“Linux-IO”) open source project, adopted by the likes of Google, Red Hat (now IBM) and a multitude of array manufacturers. Linux-IO eventually became the industry standard and emerged as an essential ingredient to software-defined IT infrastructure.

While the industry made quick use of our contribution to create a new storage system without the need for proprietary storage hardware, in naming it “software-defined” we thought they missed the forest for the trees. The code was not merely an ingredient to a new storage system, it could more broadly enable a new way of managing data at scale, which years later is called the “cloud.” The cloud is not a location, but a new architectural model, an operating model, and an economic model that replaces the rigid approach of the past to unleash a new wave of agility and velocity of development.

To provide a data foundation for the cloud, we envisioned a hyperscale, service-centric architecture that could orchestrate data anywhere – across racks, aisles, and data centers – all the while providing continuous availability, predictable performance and management by policy – from private to public clouds. If we could harness cloud computing for data to let users change their intent as they go, we could free them from trying to anticipate future storage needs. That is the genesis of the Datera Data Services Platform, a new approach to Data for the cloud era.

Why Hyperscale?

Swarms are composable and adaptive while monoliths are not. Swarms of starlings are infinitely adaptable and resilient; dinosaurs are the antithesis of adaptive – extinct. A swarm approach to data could similarly adapt to changing capacities, components and requirements, while the older monolithic approach, embodied by arrays, lived and died with each generation of chip and media.

Google adopted swarm design almost twenty years ago for its compute instances. The IT giant builds its data centers from thousands of commodity servers and uses autonomous distributed software to orchestrate them into coherent swarms. Now known as “hyperscale,” this architecture is transforming how IT is designed and delivered, putting IT as we know it under existential pressure – the Jurassic IT Era is coming to its end.

The promise of hyperscale is to converge diverse hardware resources into one coherent swarm with entirely new levels of adaptability and scalability, which embodies the promise of cloud computing. Its implementation, however, is riddled with severe challenges like dealing with node heterogeneity, data gravity, data consistency, combinatorial reliability and availability, operational complexity, performance assurance, scalability cliffs, and so on, not to mention the fundamental physical realities of time, distance and network latency.

To pursue our vision for data operating at hyperscale, we assembled an interdisciplinary team around our founding architects Nicholas Bellinger (a Linux storage leader), Claudio Fleiner (a hyperscale wizard), Raghu Krishnamurthy (an automation thought leader) and Bill Rozas (a brilliant computer and semiconductor architect). We sought to tackle the above challenges and build a system ready to serve a wide spectrum of workloads, from traditional to cloud-native, for the most demanding, mission-critical environments and enterprises. Together, we released the first enterprise software-defined storage to deliver on the benefits associated with the cloud.

We used code to build a data swarm out of industry-standard servers and eliminate the friction so often caused by hardware. Rather than reduce choice and variability at the hardware layer, which eliminates choice and flexibility, we embraced environments comprised of servers with different profiles, from all flash to hybrid, and generations from a variety of manufacturers. Hyperscale has to be frictionless.

But what good would it be if the platform couldn’t adapt rapidly and continuously to deliver consistent, predictable, and scalable performance for demanding workloads? This is where the team’s background in performance down to the kernel came into play. And we were also able to leverage our team’s experience in distributed systems to enable transparent data mobility and lockless distributed data coherence that enables synchronous stretched clusters.

The Cloud Operating Model Requires Automation Everywhere

If IT is going to operate at an entirely new scale, it can’t carry over the manual approaches of the past. We embraced the swarm management paradigm of the hyperscalers, but sought to put the applications themselves in control of their requirements from the infrastructure in terms of performance, availability, and so on. In our platform, the data infrastructure is continuously composed and delivered as a service, driven by applications and not by humans, and based on application profiles or intent. The intent is unchanging and, together with transparent data mobility, makes data portable and data management scalable across heterogeneous endpoints, technology, and innovation. As Nicholas is apt to say, “Datera allows application developers and IT organizations to change their minds in terms of application need because our platform can adapt to new or emerging requirements.” Simply change a parameter in a class of service – place data replica 2 on flash rather than disk – or change from one class of service to another.

Our automation model decouples storage consumption from deployment, continuously brokering between them while it independently manages and scales them. The platform monitors every customer environment – what we term Fleet Analytics since the platform assesses every server – the entire fleet, analyzes delivery against the intent, and doesn’t merely flag any issues that arise, but takes action to remedy them, like replicating data across the swarm or from one storage media type to another within a node, offloading manual tasks.

The Cloud Economic Model Means Pay-As-You-Grow

Enterprise IT groups continue to be under tremendous pressure to support more applications, faster development cycles, and data environments that double nearly every year, and to do so with constant or declining capital and staff resources. While this gave rise to the adoption of cloud environments from the large public cloud vendors, the dream of lower overall expenses has been elusive. Now, we see a majority of enterprises looking to re-platform to the cloud to achieve their economic goals as well as increase business agility and reduce technology risk. The cloud economic model offers enormous OpEx elasticity, which makes failure cheap and success expensive, and it locks customers in with captive data services. So there is a universal need for data services that converge public cloud simplicity and elasticity with private cloud control and efficiency, to create multi-cloud optionality.

The Cloud is All About Scale & Scaling the Right Way

To achieve these ends, we reimagined storage from being system-defined to service-defined, because cloud is an operating model and not a location. We rethought storage to scale data across a constant flux of technology innovation and obsolescence cycles and across private and public clouds driven by current and future application intent. We envisioned an “eternal” data services continuum that combines service-defined simplicity with enterprise ‘ilities,’, and to build it with the right architectural, operating, and economic approach.

At Datera, our team continues to bring the cloud model to enterprises and help them scale the right way to conquer their digital transformation goals.

Get a Consultation

Discover how you can take advantage of Datera enterprise software-defined storage with advanced automation: Contact us to schedule a free consultation.