Windows Azure - Disk Performance Surprises Explained
8A few months ago, we blogged about some surprising metrics coming from Azure Disks. Â Notably, we found instances where Standard Storage was faster than Premium Storage and instances where Premium Storage was dreadfully slow (as in 2MB/s). Â A great guy from the Azure storage team reached out and helped me to understand better what was going on. Â
Hereâs my take on whatâs going on. Â Caveat: Iâm not a storage expert, just a guy trying to make sense of this stuff.
First off - when measuring the disk throughput, it is going to be governed by the âtransfer request sizeâ and the amount of concurrency (ânumber of threadsâ in CrystalDiskMark or âOutstanding I/Osâ in Iometer).  Azure disks (and likely all cloud disks) are more heavily dependent on request size & concurrency to achieve high throughput than would be a local drive.  Weâll see how that plays out below.8
Before we update the benchmarks, letâs update/explain some of our findings from the original post.
In the initial post, CrystalDiskMark showed a dreadful 2.207MB/s write speed for the Premium Disk using a 4k transfer request size and 1 thread (this is the bottom right cell of the test) whereas the Standard Disk showed 61.82.  The reason for this is that the Standard Disk was also the o/s disk and this disk has read/write caching enabled - so it was the write cache that âsped upâ this test.  As of yet, you cannot enable the write cache on data disks and using the write cache does expose some risk of data lost.  So, the conclusion that the Standard Disk is faster in some cases than the Premium Disk is flawed - itâs the âStandard Disk with read/write cachingâ that may be faster.
Regardless of #1 above, weâre still left with the question:Â âWhy does Premium Storage only deliver 2.207 MB/s write speed especially when local SSD is 40x faster (82.75MB/s)Â for the same test?â Â The answer, turns out to be pretty straightforward. Â The cloud drive is limited to 500 IOPS per thread - so with a 4K transfer size and 1 thread, you get about 2MB/s (4k * 500/s = 2MB/s). Â If you change this to 2 threads, you get 4MB/s. Â This is the most simple example of how concurrency affects cloud drives. Â You could also get 4MB/s by doubling the transfer size from 4k to 8k. Â
Now, letâs update some benchmarks using Iometer (the Azure storage expert showed me this tool which provides more explicit control over transfer size and concurrency. Â
Iâm going to use a 4k transfer size, as this is my best guess as to what is likely for ElasticSearch (Iâm still very unclear on how transfer size is affected in âreal lifeâ between the application stack and the o/s).  This corresponds to the built in â4 KiB; 0% Read; 0% randomâ âAccess Specificationâ in Iometer - with 0% Read, I believe this means itâs a write-only test.  When I use the term âthreadsâ below, it corresponds to the â# of Outstanding I/Osâ in Iometer.
Iâll just provide the numbers for the Premium Disk (P30 disk on DS12 Azure VM - 4 cores, 28GB RAM) compared to local SSD (4x 800GB SSD in Raid 10 - 6 cores, 64GB RAM). Â I realize the machine sizes are different and may skew the results (but hey, youâve got to work with what youâve got!).
The maximum IOPS for the DS12 (12,800) and the P30 disk (5,000) are taken from this article as are the maximum throughput for the DS12 (128MB/s) and the P30 (200MB/s).
Premium Storage - 4k, 1 Thread
Notes: the highlighted portions show the key metrics - IOPS (474) and Throughput (1.94MB/s). Â This is exactly what we expect - a single thread is limited to 500 IOPS and with a transfer size of 4k, we get the expected throughput (4k * 500 IOPS = 2MB/s).
Premium Storage - 4k, 8 Threads
Notes: again, we get the expected result. Â IOPS goes to ~4,000 (8 threads * 500 / thread = 4,000) and throughput scales accordingly.
Premium Storage - 4k, 16 Threads
Notes: here we get something interesting as we see that the IOPS is not 16 * 500 = 8,000 but rather 5,000 which is the limit for a P30 disk.  We can conclude here that: The maximum throughput on a single P30 disk with a 4k transfer size is 20MB/s. This is not a limit of the VM, but rather the disk.  In theory, you could stripe 3 P30 disks on a DS12 and increase the max IOPS to 12,800 which is the limit for the VM.
Premium Storage - 32k, 16 Threads
For kicks, I ran this test which has a very high transfer size (IMO) and a very high degree of concurrency (16 threads). Â But, this does provide the maximum throughput as advertised on the DS12 (128MB/s).
Notes: here you can see that while we havenât reached the maximum IOPS for the P30 disk (5,000), we have reached the maximum throughput for the DS12 (128MB/s).
Local SSD - 4k, 1 Thread
Local SSD - 4K, 8 Threads
Local SSD - 4K, 16 Threads
Notes - Looks like weâre getting close to the max IOPS for this disk array. Â Even when I move to 64 threads, it only drives IOPS to 26,000 (minor increase from the 24,685 in the 16 thread test).
Local SSD - 32k, 16 Threads
Notes - What? Â 1GB/s? Â Now thatâs absolutely blazing fast!
---------------------------------------------------------------
Conclusions:
This is a pretty complex scenario, dependent on #cores, Windows, ElasticSearch, Lucene, Java, document size etc.  I feel pretty good that they capture a âtypicalâ ElasticSearch scenario, but obviously there are a lot of variables involved.  Regardless, for this test setup, bastardized though it may be, I feel OK in making this conclusion - and it matches my âempiricalâ evidence in our applications having run on both DS12|Premium-Storage and physical machines with Local SSD.
SSD is significantly faster (5x to 30x) than Premium Storage, regardless of transfer size or concurrency.  For âtypicalâ ElasticSearch scenarios (see below) SSD is 3x to 5x faster than Premium Storage
âTypical writeâ ElasticSearch scenario (4k transfer, 8 threads), SSD is 5x faster than Premium Storage (92MB/s vs 15MB/s). Â By default, ElasticSearch uses #processers as the number of bulk operation threads - this means only 4 threads on the DS12; a good optimization if running ElasticSearch on a DS12 may be to increase many or of all the threadpools.
âTypical readâ ElasticSearch scenario (4k transfer, 8 threads), SSD is 3x faster than Premium Storage (281 MB/s vs 67MB/s). Â [These tests are not shown above.]
Of course, everything is based on particular scenarios and workloads - there can be no steadfast ârulesâ about what environment is optimal for every scenario.  Disclaimer aside, I hope this helps expand the understanding of disk speed, both on and off Azure.













