I am currently working on a project where we are trying to figure out our storage architecture for one of our enterprise systems. One of the biggest challenges is trying to figure out how many spindles we need in order to support our OLTP system. I found a blog post by Bruce Spencer from IBM who had an interesting formula for calculating the spindle count. Although the algorithm has some built in assumptions I found it interesting enough that I wanted to share it with you...

 

IOPS (I/O's per Second)

Data Transfer Rate (MB/sec)

Min # of Disk Drives to Support Workload

Random I/O (10k RPM) 125 0.5 n = (%R + f (%W))(tps)/125
Random I/O (15k RPM) 150 0.5+ n = (%R + f (%W))(tps)/150
Sequential I/O 2000 50 n = (MB/sec)/50

Where:

%R % of disk I/O's that are reads.
%W % of disk I/O's that are writes.
f 1 for ordinary disks, 2 for mirrored disks, 4 for Raid 5 disks.

Using the above formula, here's the minimum number of disks required to support a random I/O workload, at 1000 IOPS, 80% read, 20% write on 10K RPM disk drives.

Ordinary disks: (0.8 + 1*0.2)(1000 IOPS)/(125 IOPS/disk) = 8
Mirrored disks: (0.8 + 2*0.2)(1000 IOPS)/(125 IOPS/disk) = 10
Raid 5 disks: (0.8 + 4*0.2)(1000 IOPS)/(125 IOPS/disk) = 13

 

If you are lazy...

% Reads: The percentage of disk I/O's that are reads.
Target IOPs: The total number of IOPs you need to support.
Disk RPMs: I/O's per Second or iostat "tps"
RAID Level:


Finally, I will throw my two cents in regarding this topic:
  • These formulas will give you high level estimate. They are based on assumptions and you should not use this as the only source of information for your storage solution. If you are upgrading a current system then gather as much data as you can so you can establish a benchmark. Then you can make predictions based on how your company utilizes the system instead of some fictitous numbers produced by a vendor.
  • Do not fix software problems with hardware. If you have a batch process that takes 10 hours, is it really due to a poorly performing I/O subsystem or is it bad code?
  • If you need performance think throughput and not capacity. To accomodate a system that requires a high amount of I/O you will need to buy large amounts of small disks.

Admittedly, I am not an expert on storage. It is something that I am trying to learn more about but I do not have a lot of first hand experience with. If you are an expert then I am very interested in hearing your opinions on this topic. Please comment!