Citrix Blogs

Reducing Shared Storage IO Over 90% with MCS Storage Optimization

This is part two in a series of posts on Machine Creation Services Storage Optimization (MCSIO).  For those not familiar with MCSIO, please read the first blog in the series, Introducing MCSIO Storage Optimization, to get an overview of the technology and architecture of MCSIO.

To help answer the question of how MCSIO can reduce shared storage IOPs, a series of tests were conducted on MCSIO configured with temporary memory and temporary disk caching. Although the blog focuses on this configuration, the results also give a useful insight for those looking at using other MCSIO configurations. Through a series of tests, we examine the impact of using this feature with RDS and VDI desktops and in turn demonstrate how it can help reduce shared storage IO.

Test Methodology

I have chosen to use LoginVSI, a tool that many of you are likely familiar with, to conduct a series of single sever scalability tests using knowledge worker workload, logging users in at a rate of 1 every 30 seconds.

Running this test served two functions: it allows for the observation of system resource utilization under load, such as IOPS or CPU usage, and the measurement of user session experience as defined by LoginVSI. Each test was conducted on catalogs with increasing MCSIO temporary memory cache sizes. This allows us to characterize the impact of temporary memory on shared storage IO and temporary disk cache usage. Temporary memory cache size was increased until temporary disk cache was unused and all data was cached in temporary memory cache. Baseline tests were run on standard MCS for comparison.

Results for IOPS are displayed as the sum total IOPS for each test configuration.

Results of using MCSIO with RDS workers

These tests were conducted on Windows 2012 R2 RDS workers, following the defined methodology.

Tests with the default 256MB memory cache size show that write IOPS to shared MCS storage are effectively removed and offloaded to temporary memory and disk cache. On the 256MB configuration, there is a reduction in sum write IOPS of 39% on the temporary disk compared to the standard MCS configuration on shared storage. The larger the temporary memory cache configured for the provisioned machine catalog, the greater the reduction in IO load on the temporary cache disk. On the 4GB configuration, all temporary data is cached in temporary memory cache and the temporary cache disk is not used. This results in a total sum reduction in IOPS of ~93% on the standard shared MCS storage.

In the 256MB test the redirected sum read/write IOPS are 7% lower than the MCS baseline read/write IOPS. Why is this important?  The redirected IO shows two things:

This modest reduction indicates that with the test workload, temporary memory cache is quickly consumed. Therefore the bulk of the IOPS previously performed on standard MCS with shared storage are now directed to the temporary disk.

More temp memory cache = fewer r/w IOPS

To sum up, when using a small amount of temporary cache memory, you are effectively redirecting the majority of your desktops’ IO to your temporary disk cache storage. The importance of additional memory is shown in additionally conducted tests. Doubling the available temporary memory cache from 256MB to 512MB, reduces sum read/write IOPS by 50%, 1GB by 77% and at 4GB we are using just the temporary memory cache and not reading/writing to the temporary cache disk.

This result highlights something of importance that should always be considered when configuring your environment: understanding the balance between temporary memory and disk cache usage. This will impact decisions around what size, type and performance of the storage is used for temporary disk caching e.g. local hypervisor disks, NAS, SAN, and so on.

It is important to ensure that the temporary disk cache storage is able to handle the IO traffic being directed to it, otherwise it could result in temporary storage becoming a bottleneck to the performance and usability of the system. The results demonstrate that you are able to effectively use memory cache as a way of manipulating your temporary cache disk storage requirements. The greater the temporary memory cache, the lower the temporary disk storage space and IO requirements.

Another way of demonstrating this is through perfmon counters installed by default on your RDS or VDI worker. They are a good way of understanding in session temporary cache usage and in turn can be helpful with sizing your environment.

There are 28 counters in total; however the most useful show the utilization and size of the temporary caches and IO actions of the OS.

Using the “cache disk size” metrics we are able to see the relationship between the amount of temporary memory cache assigned to the provisioned machine and how the size of the temporary cache disk influences the cache disk size.

As the temporary memory cache size is increased, we can see the temporary disk cache being utilized later on in the test when the memory cache starts to overflow.  This yields smaller temporary disk cache sizes.  The growth in the temporary cache disk size levels off during the test; this is expected behavior. Once MCSIO has cached enough of the operating system and user data, it is able to reuse and overwrite memory and disk cache leading to a slowdown in temporary disk growth.

Results of using MCSIO on VDI workers

This set of tests were conducted on Windows 10 x86 VDI workers, following the defined test methodology.

For VDI desktops, a smaller amount of temporary memory cache makes a notable difference in behavior. Using the default memory cache size of 256MB, the write IOPS to standard MCS storage is reduced to noise level with a total reduction of 99%. We also see IOPS go from a sum of 1339 write IOPS on standard MCS storage to 20 read/write IOPS on temporary cache disk storage.

No specific mention of other system characteristics has been made through this blog, as nothing noteworthy stands out compared to the MCS baseline tests. Observing other host metrics for both the RDS and VDI tests showed no notable variation in CPU utilization. LoginVSI baseline scores denote similar user experience through all tests regardless of temporary cache size.

Summary

This is the second in a series of blogs. Expect to see a third in this series, which will focus on recommendations and lessons learnt from investigating this feature.

Exit mobile version