Enterprises Are Moving to Software-Defined Storage

72% of Enterprises Are Moving to Software-Defined Storage. Why and Why Now?

When I first saw Scott Sinclair’s recent ESG research indicating that 72% of large enterprises are committed to software-defined storage for their data infrastructure of the future, I thought he had to have transposed the digits: 72% is a big number; the array vendors are, in a word, entrenched, and corporate IT is already fighting multiple wars simultaneously on the security, application transition to containers, and cloud rightsizing fronts.

So with the CIO agenda for 2020 full and then some, why and when are they moving to a software-defined storage future? Scott found that 55% of enterprises have software-defined storage deployed in some capacity now, with a higher percentage indicating they plan to expand and deepen their usage of SDS in 2020. Part of this expansion hinges on adopting a multi-vendor strategy as they’ve done in all other critical parts of operations, from hardware vendors to public cloud vendors.

Our conversations with Fortune 1000 enterprises over the last two years are very consistent with Scott’s findings and reveal the main factors behind the shift to software-defined:

Increased Performance of Software-Defined Systems1. INCREASE PERFORMANCE. Software-defined systems are taking advantage of the move to 100Gb+ Ethernet and NVMe drives; throughput and latency problems are solved problems for the data center, but performance can be crippled when essential services like deduplication, compression and encryption are on. One Datera customer boasts that he cut latency by 90% over prior storage arrays by standardizing on SATA Flash and NVMe drives.

Enterprise Cloud Storage Automation2. SIMPLIFY THROUGH AUTOMATION. Aggregating application and tenants into an enterprise cloud and increasing velocity via self-service utilization for application owners requires automation. Policy-based administration and management by application intent—where policies set thresholds like the IOPS needed for an application and the software utilizes the right resources to meet that threshold—eliminate the manual tuning required by legacy approaches, and dramatically simplify the overall environment. Automation is no longer a nice to have, but essential for operating at scale.

SDS Systems3. FLEXIBILITY AND CHOICE IN HARDWARE SELECTION. CIOs overwhelmingly want the ability to choose the hardware they require at the right time from a menu of choices. All too often, storage systems have limited choice to only certain types of media and to the same generation of hardware system, making system design inflexible. But SDS systems like Datera are designed to rapidly incorporate the latest new technologies and enable a broad selection of media types (NVMe, SATA Flash, HDD) as the system expands and of server hardware from a pool of vendors including DellEMC PowerEdge and HPE ProLiant spanning multiple generations, enabling choice that helps overcome the supply chain challenges the industry is currently experiencing and provide leverage to negotiate for procurement to get better pricing and delivery times.

Scale Out Storage4. SCALE FOR TODAY AND TOMORROW: Enterprises are universal in their desire to scale their systems just-in-time when they need new capacity and capabilities, and to do so granularly (node by node) from a few hundred terabytes to a hyperscale threshold of multiple petabytes. Well architectured systems not only can meet scale requirements but can utilize new capacity to expand performance, durability, and resilience as the environment grows. Datera has worked with several CIOs that embraced hyperconverged infrastructure (HCI) like VMware vSAN and Nutanix to serve a need quickly, but experienced an inability to scale beyond 10 to 20 nodes and achieve acceptable performance, which led them to Datera.

Multi Cloud Data Management5. REDUCE OPERATING EXPENSES WITH SELF-HEALING SYSTEMS. Combining a distributed data management approach with advanced telemetry to monitor the health of each hardware node and rebalance when appropriate offers the potential to help all users with practical capacity/performance planning and best practices in real-time. One of our customers has, in his words, “eliminated hardware maintenance” by simply swapping in new hardware nodes when an individual server fails rather than take systems offline and deploy specialists to revive the faulty array. With Datera, decommissioning a node and bringing on a new node can be achieved without downtime and in a matter of minutes. And the system, rather than the administrator, rebalances the environment to re-incorporate that node and deploy the right data to it to utilize the media type.

Properly Designed Software-Defined Systems6. INCREASE DATA AVAILABILITY. For Fortune 1000 enterprises, data availability remains paramount in system design. Properly designed software-defined systems shun typical RAID schemes and deliver availability by distributing copies of data across nodes, racks, aisles, and data centers with close attention to fault zones. This enables a higher level of availability than traditional systems. At Datera, each node gets “over-the-air” updates so there is no need to take a system offline to upgrade it, eliminating the biggest downtime culprit—planned downtime.

Container Solutions7. OPTIMIZE FOR CONTAINERS AND KUBERNETES. 90% of enterprises are using containers in production now, revealing cracks in the infrastructure’s ability to keep pace with the transitory nature of the applications. CIOs are actively looking for storage systems optimized for containers providers and tools like Docker, Kubernetes, and Red Hat Openshift. SDS systems can be a good choice when they can autonomous operations and optimize data placement to match the agility afforded with container deployments.

Scale Out Storage Vendors8. MINIMIZE LOCK-IN. The infrastructure industry is notorious for vendors locking customers into an inflexible architecture and an artificially limited set of hardware choices designed around their economics rather than yours. To avoid lock-in, software-defined approaches should support a wide variety of hardware profiles—different server vendors, different server models, different server generations, and a variety of different media—to keep their choice and negotiating power as high as possible. While many SDS systems are simply packaged appliances with narrow options, Datera offers a number of qualified servers from the top server manufacturers, including Dell EMC PowerEdge, HPE ProLiant DL360s and DL380s, Fujitsu Primergy MX2540s and Cisco UCS C240s.

Digital Transformation Consulting9. REDUCE TECHNICAL DEBT. Enterprise CIOs can significantly reduce technical debt in networking and storage through SDS. Reducing technical debt is often less about total spend than it is about replacing monolithic systems with more fluid and customizable systems that change the paradigm of arrays in two waysexcessive markup and forklift upgrades. First, the raw expense of fibre channel networking and the markup vendors are exacting for all-flash and all-NVMe systems have reached a new peak, but Datera harnesses commodity servers and eliminates the three-year refresh cycle which reduces capital expense upfront and as time goes on. Datera environments are “evergreen,” meaning they are continuously refreshed without downtime.

2020 AND BEYOND

Software-defined storage for enterprise clouds crossed the chasm from early majority to late majority in 2019. As we move to 2020, the Fortune 1000 are engaging multiple software-defined vendors and moving up to SDS from HCI implementations using VMware vSAN and Nutanix that were never meant for enterprise rollouts. Scott summed it up that ESG’s “market research shows that a majority of enterprises are adopting software-defined storage technologies as part of their data strategy and Datera is emerging as one of the central players for mission-critical applications.”

Software-Defined Storage for Enterprise Clouds

Get a Consultation

Discover the advantages of enterprise software-defined storage with advanced automation: Contact us to schedule a free consultation.

Next-Generation Cloud Storage

11 Essential Requirements To Evaluate Next-Generation Cloud Storage

Fortune 1000 companies can use this document to generate a comprehensive Request For Information and a focused and efficient Proof of Concept to test for current and future storage needs. 

Transition was everywhere in 2019. Enterprises that had rushed to get to the public cloud started bringing applications back. They also began to deploy Tier 1 workloads with software-defined technologies instead of storage arrays. Change was the only constant as it related to the storage deployments of the Fortune 1000.

While change was in the data center air, the key requirements for the next-generation of storage infrastructure became very clear. Datera is comprised of the world’s leading SDS architects and former end users, and we have worked with scores of Fortune 1000 companies to understand their data storage needs. 

While we recognize that every organization’s applications and needs are different, from this unique vantage point we developed a set of common requirements and best practices to help you and your organization get a fast start on the path to a new and better data infrastructure. 

Storage Categories and Goals the Fortune 1000 are Evaluating

When getting started we recommend first investigating the four main categories of storage technology that drive different sets of requirements.

  1. Enterprise-Class Flash Arrays. Arrays from the leading vendors are strewn all over the floor, whether deployed on a standalone basis or more recently as converged appliances. Enterprises want to retain the pros of arrays—the performance levels, the 9s of availability, while distancing themselves from the cons—the high cost, the inflexibility, the lock-in, the homogeneity of media choices, and even the need for Fibre Channel to achieve performance and stability.
  2. Public Cloud Services. The impact of the public cloud cannot be overstated. AWS, Google Cloud and Microsoft Azure—none of which use arrays to build their hyperscale data centers—showed the market that infrastructure could be done in a new, more agile, and more cost-effective way. Similarly, enterprises looked to see if they too could build their infrastructure in this manner to achieve the same level of operational agility and velocity and to do it on Ethernet rather than Fibre like the cloud players do, but they also wanted to avoid the massive cost of inflation they have experienced in their monthly bills, often 5X higher than on-premises infrastructure.
  3. Hyperconverged Infrastructure (HCI). HCI growth remains strong, particularly in emerging regions. It provides an easy on-ramp to a shared infrastructure and software-defined approach, but shows inherent system limitations in scale, performance, and hardware utilization. Enterprises would like to retain the simple deployment and procurement models that HCI software vendors provide, but to do so without the common problems that have plagued the leading HCI platforms such as “noisy neighbor” syndrome where certain applications or tenants overtax the infrastructure and compromise other applications, and the inability to scale beyond monolithic orchestration within a single cluster.
  4. Software-Defined Storage (SDS). SDS is seen as combining the best attributes of the other storage choices—the dedicated performance of arrays, the agility of the public cloud, and the potential to consolidate applications and tenants of HCI—with the additional benefits of automation, while lessening the vendor lock-in that has pervaded the industry since its inception. While its benefits have long been evident, it’s important to test multiple vendors against one another to understand differences in performance and availability with data management services turned on (e.g. encryption, compression, deduplication). Equally important is a test of the reliability of automation to drive quality of service and to quantify its value in understanding admin resourcing.

The Fortune 1000 test new storage approaches to maintain and expand the benefits they’ve seen in the past while finding new ways to eliminate old headaches and reduce the cost profile. 

Fortune 1000 Requirements for High-Performance Block Workloads at Hyperscale

In this section, we include a list of core requirements the Fortune 1000 should test against to understand which storage category can deliver. You may further refine based on your particular use cases. 

  1. LATENCY: The system must provide 1M or more IOPS with under 1-millisecond latencies. Storage needs can change at a moment’s notice, so it is essential that a system can expand rapidly to achieve performance and capacity requirements. SQL and NoSQL databases require high IOPS and low latency storage systems that can scale performance and capacity with ease. Testing 1 Million IOPS under 1 millisecond is a common threshold, so we’d suggest you start there and add more if your specific workloads require it. Also, test the ability to expand this with the fully supported addition of asymmetric media nodes, including NVMe and Storage Class Memory (SCM) such as Intel Optane.
  2. THROUGHPUT: The system must support a minimum of 64 GB/s of overall throughput. Throughput has become more important for most organizations than raw storage performance since throughput is the ultimate measure of application (rather than storage) performance and highly valuable in multi-tenant environments. The combination of database and other workloads may push the network’s overall performance as well, which can require the network and storage teams to agree on the testing. This has proven to be valuable in enabling a move to 100GbE and 200GbE networks (similar to the public cloud providers) and can yield massive savings in administration time and costs when compared with complex Fibre Channel networks.
  3. ASYMMETRIC SCALING: The system must be able to scale granularly (node by node) to a hyper-scale threshold of multiple petabytes, yielding additional granular capacity, performance, durability, and resilience with each additional node. The system must be able to scale asymmetrically and rapidly—typically from a few hundred terabytes to multiple petabytes—and to do so non-disruptively without downtime. The test should include adding different kinds of nodes along the way to demonstrate that the environment not only incorporates the new capacity and horsepower, but rebalances the system without the need for manual tuning. Scaling the environment should not drive significant new admin time because the savings achieved on the capital side could be offset by extra costs in personnel. Pay close attention here, since many enterprises see massive differences in scaling between systems. At a minimum, test the ability to scale up within a rack and scale-out across racks and across aisles within a single data center, since this is what scale-out architectures must achieve to provide the flexibility enterprises seek.  
  4. PERFORMANCE WITH DATA MANAGEMENT SERVICES ENABLED. System should show minimal performance degradation even when more than 60% utilized. Vendors have a habit of painting a very rosy picture of theoretical performance, which is often measured without using features that utilize CPU cycles in storage hardware. Enterprises often see a massive dropoff in system-wide performance in the systems they test when even basic data management services were utilized—including compression, encryption, snapshots and deduplication—that render those systems a non-starter. Be sure to test the systems under loads when application traffic is high to understand how the system would respond. The tests should incorporate both elements—data management off and on, traffic high and low—to give the best picture of real-world performance. Architects testing the system should also record a time sequence of the monitoring tools to show the ebbs and flows of the system over time and how it responds. To not do so would invite trouble in an actual deployment.
  5. CONTINUOUS DATA AVAILABILITY. The system must be architected to be available and survive multi-node, multi-rack failures within the data center. More than just data durability or uptime, a system must offer non-disruptive software updates, survive multiple component failures, power outages, rack failures, and unexpected data center events. A real test of availability is possible using a combination of snapshots (replicated locally and remotely to a public cloud), stretch clusters, failure domains, and replica count. All vendors frequently speak about “9s” of availability, but planned downtime is frequently not used in those calculations. The test should incorporate the ability to maintain complete availability while simultaneously making changes to QoS policies, as well as adding new nodes.
  6. CLOUD OPERATIONS. The system must support application and tenant aggregation and consolidation with simple provisioning and self-service utilization for application owners. The term “cloud” entails a variety of different needs for the Fortune 1000, and with much less consistency than service providers or software-as-a-service companies. But the common thread is a need to support multiple orchestrators including VMware, Kubernetes, Openstack, and bare metal in order to support a variety of applications and the velocity of stateful and stateless events. It is essential to test these not merely in isolation with a separate cluster for each, but in a common cluster for all. Otherwise, you may run the risk of bringing on a new system that becomes an island on its own with stranded data and hardware accompanied by administrative overhead. Further, we highly recommend that the test include the use of policy-based administration that can allow administrators to set up and administer groups of applications as a class rather than on an individual basis. Testing the ability to support multiple application orchestration is simply a baseline requirement.
  7. AUTONOMOUS DATA PLACEMENT. The system must autonomously assign and re-assign workloads to the proper node according to preset requirements. Whether based on application traffic (to move the data as close to the application as possible) and on the storage media resident on the node (for instance, putting the right data on an NVMe drive), the system should automatically self-optimize system-wide performance and availability. Initial testing should include evaluating systems for their ability to place that data based on policy, and advanced tests should examine the quality of service delivered by workload to understand whether the system is delivering proper placement and whether the policy is aligned properly to the SLA required.
  8. NEW TECHNOLOGY INCORPORATION. New technologies, at both the server (CPU) and media level, must be able to be rapidly deployed and utilized by the system without adding administration time to put them to use. To test this capability, enterprises start with a variety of server types and media types and then add new and different nodes during the life of the test. Similar to the testing of autonomous data placement, as new nodes are incorporated administrators should determine if data is indeed moved automatically to the new node and specifically what data is moved to utilize the new CPU and media available. Growing an environment can be easy, but if the system does not automatically take advantage of the new capacity and horsepower, that growth generates needless expense.
  9. ETHERNET ENABLED BGP PEERING: The system should have the ability to use standard iSCSI deployed over L3 networking at the core for data operations. The test should include a demonstration of BGP integration into the routing fabric, which can drive a new layer of agility in placement of the data across data center and significantly greater agility than Fibre Channel or standard L2 networking.
  10. SELF-HEALING. The system should have predictive analytic capabilities that incorporate system-wide information, often called telemetry, into a feedback loop to continuously improve against desired attributes. Testing the system-wide monitoring capability should include an understanding of latency, performance, and availability information from each node, as well as the system’s ability to provide notification to the test administrator of any issue at both the network and storage layers. Advanced systems use telemetry to help all users with practical capacity/performance planning and best practices in real-time. Testing for this capability ensures that you select a system that has the potential to learn from itself and improve your environment over its lifecycle.
  11. LOCK-IN. The system should support a wide variety of hardware profilesdifferent server vendors, different server models, different server generations, and a variety of different mediato eliminate the potential for vendor lock-in. The infrastructure industry is notorious for vendors locking customers in to an artificially limited set of choices designed to enrich their top-lines. Enterprises that experienced this phenomenon with their array purchases and even public cloud contracts are looking for open systems that generate hardware options, not lock-in. Test environments should therefore seek to incorporate a variety of different hardware options from the start. Advanced testing should seek to incorporate multiple variables into a single cluster, including different vendors, node profiles, media types and server generations. Ensuring this variety is key to the long-term value of the system as well as to getting the best terms on hardware purchases at every expansion opportunity.

Fortune 1000 customers have every vendor in the IT industry at their beck and call. Selecting the right technologies to test and using the right test parameters—outlined above—will enable them to make the transition to a more automated, scalable and performant future for their data operations.

To learn more about the Datera platform and why the Fortune 1000 are using our software-defined storage solution to architect a new data future, please examine the following core whitepaper library: