All You Need to Know About IOPS Can Be Learned at Rush Hour!

On my most recent trip to Silicon Valley, I decided to find a place to stay for less than $500 per night. My best choice ended up being my sister’s house in Walnut Creek, CA, and having to deal with a 50-mile commute to Santa Clara on I-680!

As I sat in stop-and-go traffic (mostly stop) for more than 10 hours over my four-day trip, I had time to ponder many things:

When I started in the industry, operating systems or applications had to deal with disk drive defect management, error correction and sector layout, before this was eventually abstracted and embedded into the drives themselves.
When I started, HDDs weighed 10 pounds! Now they’re the size of a few credit cards stacked together, while their density has increased 100,000-fold.
Back then, when we talked about performance we first talked about RPM, then MB/s, then IOPS.

After 30 years in the data storage industry, we can all agree that I am old. But, I would like to point out that everything has its time, and at some point, its time has passed. No, I’m not talking about me – I’m talking about using IOPS as the measure of performance.

A Bit of IOPS Performance Trivia:

Did you know that an HDD can do 250k IOPS? Don’t believe me? Issue 512-byte sequential reads. HDDs can do IOPS!
NVMe SSDs are exceeding 1 million IOPS
An NVMe storage system is pushing 10 million IOPS.
A Bugatti Veyron can go 268 MPH. I’m a car guy, so I must throw these in occasionally.

All of this begs the question, how much performance is enough? When is more better? Maybe we are looking at the wrong metrics: Is it fun driving a 268 MPH supercar in rush hour traffic?

It used to be that IOPS were important because each I/O took a long time. Doing many I/Os in parallel made sense. I bring this up because performance metrics are one of the areas often (mis) used to compare products because numbers (X > Y) are easy to compare, but often not all that helpful.

For example, you can see a variety of charts comparing the IOPS of different types of SSDs under different scenarios. If I were to distill all of them down to a single chart, it would look something like this:

I’m going to make a controversial statement about storage system performance – more is not better. More bandwidth or more IOPS – once you have enough, it generally does not help to be able to do more.

There is an exception, one where less is better – latency.

One of the smartest people I have ever worked with was a Ph.D. mathematician working as a performance engineer at HP. This guy did not just measure performance, he predicted it. By analyzing the hardware and software of a system, he created a predictive benchmark of what the system should be able to do that engineers then sought to achieve. And he was right, across several product architectures.

In complex systems like enterprise storage this is incredibly difficult, almost magical, and I had tremendous respect for him. One of the things he taught me, long before it was popular, is that service time is what matters. He had a much more comprehensive definition, but I would distill it down to the latency (time), as measured from the perspective of an application, to complete a task.

Why Not IOPS?

To complete a task, an application only generates so many IOPS, and an application only has so many tasks to complete in a day. The amount of work to be completed is finite, so being able to do more work (IOPS) does not create more work.

As a task progresses through its various steps it is often the case that I/O must be serialized, i.e., one I/O must first complete to have the data to start the next I/O. The number of IOPS doesn’t change, but the time to complete the task is highly dependent on the latency of the IOPS that are required. Lower latency means the tasks complete more quickly – and that you can turn into a competitive advantage.

The Problem – Confusing IOPS with Latency

The problem is we confuse rate (IOPS) with completion of work (latency.)

If we look at a similar chart plotting latency and queue depth what you will see is that as queue depth increases so does latency. This is like the stop-and-go traffic where I’m waiting behind others to make progress through my task of getting to my sister’s house in Walnut Creek.

In fact, for those applications where service time is hypercritical, the queue depth is usually restricted to exactly one. With a queue depth of one no other tasks are going to get in the way of getting the lowest latency. I am on the freeway all by myself!

Referring to the previous charts, you will see 5X more IOPS (good) by increasing the queue depth, but at 10X the latency (bad). A dirty little secret – as queue depth increases the latency becomes unpredictable, which can create even more problems. Commonly, there are latency outliers, called long-tail latencies, that can be 100X – 1000X the average latency, and if one-in-a-thousand IOPS creates an outlier, you can create noticeable performance degradation in many applications.

So, what does all this mean?

Latency is king. It always has been!
Be thoughtful about how you configure the various data stores (volumes, file systems, buckets, etc.) for your applications. Focus on the service time for the I/O to be done as the way to accelerate your application – in today’s world of caching, tiering and SSDs there will almost certainly be enough IOPS.
Use a system that allows you to change the quality of service (QoS) against the data stores. The ability to non-disruptively change QoS allows you to observe the performance of the applications, making you much better informed versus having to guess at the time you deploy.
Use data transformation and data efficiency functionality judiciously when response time predictability matters. Functions like deduplication, compression, encryption, RAID and erasure coding create complex resource contention and I/O multiplication within the system, i.e., they create more work in the system to be completed, thus impacting latency.
Application-aware storage can create policies to carefully and autonomically adjust internal resources to improve the latency of the I/O that matters most in improving overall application service time.
And finally, do your best to avoid leaving the office between 3 p.m. and 7 p.m. if you have to travel north on Interstate 680, or you will get to experience firsthand the misery of having lots of IOPS (cars) at very high queue depth (stopped in traffic.)

Happy motoring!