Quantcast
Channel: Home Server Blog » esxi benchmarks
Viewing all articles
Browse latest Browse all 2

ESXi iSCSI/RAID/Disk Performance: Improving Through RAM Cache

$
0
0

iSCSI Disk RAID Performance EnhancementWhen designing any kind of system, whether it be an ESXi lab or otherwise, disk performance can end up being a bottleneck. No matter how fast your processor, or how much RAM you have, if you have a ton of iowait, and your system is having to wait in line to write or read data from your drive or volume, then your system will be slow. Modern systems in need of extreme IOPS are turning to SSD arrays and SSD cards for performance, but even with the drastic reduction in price since their introduction, their price spot for GB to dollars is still much more than mechanical hard drives.

A good option for increasing performance on the cheap is a RAM cache. I’ve advocated PrimoCache (formerly FancyCache) prior to this on my blog, and had plenty of comments concerning people using it in their lab with incredible results. However, I’ve never covered its performance directly in an article, and that’s what I’d like to do here.

Note that PrimoCache is currently free and in beta, with a provided 90 day license that so far, has always been extended by the nice folks over at Romex Software. There has been no mention of the price once it exits beta, but it will be a commercial piece of software in the end. Also, it’s worth mentioning that this is beta software, and although I run this in my ESXi lab, with a good potion of that devoted to production use, using this in a full production environment wouldn’t be recommended.

SAN Hardware for Testing

iSCSI SAN Hardware

iSCSI SAN Hardware

As part of another project, I built a custom SAN for testing purposes, and that’s what I will be using PrimoCache on. The performance is equivalent to a low- to mid-range server platform, and uses a hardware RAID card with 15K SAS drives. The actual testing platform will be used for a number of different RAID levels, but it is currently configured in RAID0, the fastest performance RAID, for a comparison to speeds with and without PrimoCache.

PrimoCache Installation and Settings

PrimoCache RAM Cache

PrimoCache RAM Cache

PrimoCache has a simple, standard installer, and installation is as simple as clicking on the installer and following the prompts.

At the end of the installation, you will be prompted to reboot your system, at which time PrimoCache is available for use. The amount of RAM that you assign to your created caches will be determined once you create them in the software, and can run up to the limit of your free RAM.

After installation, clicking the PrimoCache shortcut brings you to the primary screen of the software:

Primo Cache Main Screen

Primo Cache Main Screen

Once here, you will need to create a cache. Note that the target of the cache does not need to be a physical drive. It can be a volume: either a presented RAID volume, a dynamic disk, a JBOD … anything that classifies as a volume with a drive letter inside Windows. The amount of memory that you use is only limited by the amount that you have free in your system. To prevent paging, I would suggest using no more than a total of 80% at idle (OS + Programs + Cache). In this instance, I’m going to select D: drive (my RAID volume) as my target drive:

Primo Cache: Add a Cache

Primo Cache: Add a Cache

Once selected, I click Next and am taken to the cache configuration screen. Here I’ll be able to select the amount of memory that I want to devote to my cache (in MB), and how I want that cache to be used: whether I’d like to improve read performance, write performance, or both. Write caching allows you to defer all writes, holding them in RAM and then writing them out later. It would only be prudent to use this latter option on systems with a battery backup/UPS, of course. You can also change the block size for the granularity of the cache. Smaller block sizes bring better performance at the cost of higher overhead for maintaining the cache. PrimoCache also allows the use of an SSD as a Level 2 cache. In this instance, I will not be using this.

Primo Cache: Configure Cache Parameters

Primo Cache: Configure Cache Parameters

I’ll select custom (since I’ll be using both the read cache and write defer), assign 16GB of memory to our cache, leave the block size at 4KB since I have plenty of CPU power, check the Enable Defer-Write, and leave it at the default 10 second latency.

Primo Cache: Cache Parameters Set

Primo Cache: Cache Parameters Set

Once done, we click Start, and PrimoCache will assign the RAM to the Cache (your memory in Task Manager will go down by the amount you assign), and show a success message.

Primo Cache: Successful Cache Creation

Primo Cache: Successful Cache Creation

Clicking OK on the success message takes us back to the PrimoCache screen, and now we can see our existing cache, information about the volume it’s assigned to, detailed information about the cache itself, statistics on reads and writes, and a cache hit rate chart to track our cache performance. There are also a number of icons for stopping, pausing, flushing and other operations to the cache.

Primo Cache: Main Screen with Cache

Primo Cache: Main Screen with Cache

PrimoCache Performance Testing

To give us a baseline to compare to, we’re going to stop the cache on D: drive now that we’ve created it, and run some basic I/O and performance tests without it to get some idea of our performance without it. After each test, PrimoCache will be stopped and started to clear the cache, so there is a fresh cache. The following tests will be performed with and without PrimoCache installed. The only exception will be HDTune Pro, which bypasses any cache; it is included for baseline measurement and performance of the RAID array itself. The RAID card settings are also listed below for the virtual drive:

Virtual Drive Settings

  • RAID Level: 0
  • Size: 544.875GB
  • Stripe Size: 64KB
  • Disk Cache Policy: Enable
  • Read Policy: Always Read Ahead
  • IO Policy: Cached IO
  • Current Write Policy: Write Back
  • Default Write Policy: Always Write Back
  • Current Access Policy: Read Write
RAID Virtual Drive Settings

RAID Virtual Drive Settings

RAID Benchmarking: Tests Ran

  • HD Tune Pro: Read, Write and Random Access tests
  • Crystal DiskMark3: 50MB, 100MB, 500MB, 1000MB, 2000MB and 4000MB Test Sizes Using Sequential (1024K blocks), and Read/Write Random Tests using 512K, 4K, and 4K with a Queue Depth of 32.
  • IOMeter: 8K, 64K and 256K Sizes on 100% read, 100% write, 75%/25% read/write, 50%/50% read write, and 25%/75% read/write with 10 workers for 30 minutes per test doing both pure sequential, pure random, and mixed.
  • ATTO Disk Benchmark: Transfer Size 0.5 to 8192KB, 256MB Total Length, 4 Queue Depth with Overlapped I/O (default settings)
  • Microsoft Exchange Server Jetstress Tool: Disk throughput tests at 100% drive capacity run for 2 hours, both with and without cache.

A Note About Benchmarking and Cache

Cache Performance

Cache Performance

Having a large cache in the two or three digit GB range presents some interesting issues when testing performance, as does benchmarking disk use in the first place. Hardware RAID cards have algorithms that adjust to the work load they are under, becoming more adapted, so you may see better performance in the long run, and of course, real world use is rarely 75/25, 60/40 or some other even split over long periods of time. The best we can do is look for general performance.

Caches bring a whole new problem: when benchmarking, most of the time, you’re going to have a 100% cache hit rate, which means you’re benchmarking to pure RAM. For writes, this is real-world equatable, since PrimoCache uses write-defering, so all writes are theoretically 100% cache hits. With a big enough cache (think server-level if you had 192GB of RAM), you could easily thread out all writes, even from several ESXi nodes and lots of VMs.

Read caching becomes a bit harder to estimate. Caches hold the most used items, and thus read caches take a while to “build” for good cache hit performance. So reads are what you could see if the data you needed was cached. PrimoCache also sports the ability to use SSDs as L2 cache, moving items off to them. This would dramatically increase your read cache hit rate over time, as you could theoretically have a L2 cache in the terrabytes range. That said, take the read results with that in mind.

One last note is that I have been using PrimoCache on the RAID array I have my iSCSI target on (using Starwind’s iSCSI) and have seen huge improvements with my VM responsiveness. Although I have not done any performance testing from a VM as of yet (it’s coming, it’s coming!), I can unequivocably endorse it.

HDTune Pro RAID Benchmark Results

The results below are run without the cache since HDTune bypasses that. This is to get an idea of the raw performance of the RAID volume.  In this series, I also do some IOPS tests as well as latency tests to show raw access times.

HDTune RAID Benchmark: Read

HDTune RAID Benchmark: Read

HDTune RAID Benchmark: Random Access

HDTune RAID Benchmark: Random Access

HDTune RAID Benchmark: Extras

HDTune RAID Benchmark: Extras

 

 

Crystal DiskMark3 RAID Benchmark Results

Crystal DiskMark3 tests the read and write speed performance of a drive. Although this is limited to pure drive speed, and not IOPS, this will still give us a good baseline of how our RAID array is performing speed wise. The tests will be at all the size levels, and all of the performance tests, both with and without PrimoCache. CrystalMark DiskMark3 runs each test five times and then averages the result.  All results are in MB/sec, including the up axis of the graph.

Crystal DiskMark RAID Performance Results

Crystal DiskMark RAID Performance Results

The most amazing results here are the increases in random reads and writes. These are the most punishing type of reads and writes for a drive, since the head is forced to move, at random, all over the platter surface, putting the seek times to the real test.  Of course, coming out of RAM cache, there are no seek times, so we see both write and read performance skyrocket, with some results in the +30,000% range.  This is where the cache results shine, and this would be incredible for high-write applications, such as databases if you had enough RAM cache.  One thing to remember here is that these are deferred writes, so if you fill up cache enough, you could be minutes, if not hours, writing out all the data.  A UPS/Battery Backup is a requirement in this case.

 IOMeter RAID Benchmark Results

IOMeter is one of the go-to benchmarks in the world of IOPS performance, and for good reason: it’s highly configurable, accurate, and can push a drive to its limits when properly used.  I spent the most time with these results, testing almost every aspect of performance on the RAID.  Here I did a full series of sequential tests with mixed read/write loads, as well as a full series of random access tests with mixed read/write loads.  Although the most common block sizes are 4K, 8K and 64K, I decided to spend the time to test at every block size, just to get a complete picture.  All values below are in IOPS.

IOMeter RAID Performance Benchmark: IOPS

IOMeter RAID Performance Benchmark: IOPS

Although we don’t see the same out of the ballpark performance gains with PrimoCache on, you still see a remarkable percentage gain out of IOPS with it enabled.  The largest gains are in the middle of the curve in the random writes, which I expected to see considering all deferred writes are automatically a 100% hit rate unless the cache is full.  Once again, RAM caching delivers solid gains.

ATTO RAID Benchmark Results

ATTO Benchmark is another large name in benchmarking results, and by default, it measures transfer speeds in several different test sizes.  I had some consistently odd results with ATTO in the 4096KB test size that I’m unsure about.  Even though PrimoCache would report 100% cache hits, CPU was low, and no other performance issues manifested themselves, I would get bad results only in this range about 50% of the time.  I didn’t see similar results in any other benchmarking utility, and attribute it to some quirk with ATTO.  In the interests of transparency, I’m leaving the results here.

ATTO RAID Benchmark Results: No Cache

ATTO RAID Benchmark Results: No Cache

ATTO RAID Benchmark Results: PrimoCache Enabled

ATTO RAID Benchmark Results: PrimoCache Enabled

Conclusions on RAID and RAM Cache Performance

Obviously, RAM caches, and PrimoCache in particular, have a huge performance gain when properly used. Although we’re only using a 16GB cache in these tests, I can see this scaling without issue into the 100′s of GBs on server-class hardware. PrimoCache, unfortunately, is still in beta, and there’s no way I could recommend it in a production environment until it’s officially in a release stage. However, for a home or lab environment, I whole-heartedly recommend it. For the past year, I’ve used every version that’s come out, and have had not a single reliability issue.

Please feel free to comment below, or post any questions, discussions, or observations in our ESXi Forum.


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images