When deploying a Citrix environment, customers often ask about the best way to provide high availability (HA) for their profiles. This blog post will discuss the most common options with their pros & cons so you can make this critical portion of your environment more resilient.
Please note that this article is limited in scope to high availability considerations; we’ll tackle disaster recovery in the future.
PaaS and hardware-based solutions
For public cloud deployments, the leading practice is to use the vendor’s file server Platform as a Service (PaaS) offering. Azure has Azure Files; AWS has FSx; and Google Cloud Platform has third party options. In addition to cloud specific offerings, NetApp PaaS offerings are also available for all three vendors. These solutions offer high availability, reduced management overhead, and increased scalability. Ensure the chosen solution supports SMB protocol (so it can be accessed by Windows VDAs), NTFS permissions (so that authenticated users can access their files), and that it can scale to the number of concurrent connections that you require.
For non-public cloud deployments, a hardware-based solution such as a NAS or SAN would usually be the best choice due to superior performance and high availability compared to software-based solutions.
If these choices aren’t available, read on for more options on customer-managed file servers.
Preferred alternatives
This section discusses different configurations of Windows File Server, but Windows licensing may affect which features are available for Windows Server 2016, 2019, and 2022:
- Failover Cluster: This provides access to shared storage in an active/passive configuration. At any given time, only one cluster node has access to the user data. While there is high availability at the Windows File Server level, there is only one copy of the user data as it is shared. Note that with Citrix Profile Management (CPM), the state of file locks might not be preserved if a server in a failover cluster fails. (See our documentation for more.) As a result, it is possible that a failover at a critical point might result in profile corruption which is hardly ideal.
- Scale Out File Server (SOFS): SOFS is similar to a failover cluster, but nodes are active/active. This means that all file shares are available on all cluster nodes. Sounds promising, right? Unfortunately, they are not recommended for file-based profiles or folder redirection shares because of the poor performance experienced when doing many metadata operations (such as opening/closing/creating files like you would in a file-based user profile). However, this could be a good option for customers using a container-based profile solution (like CPM containers) since containers’ metadata requirements are significantly lower.
- Storage Replica: This performs unidirectional replication between volumes on standalone or clustered servers for HA or DR purposes. This solution uses block-level replication, avoiding the pitfall of file-locking associated with alternatives such as DFS-R which is discussed below. (Microsoft mentions that Storage Replica technology can be a replacement for DFS-R) Since this solution can use clustered servers in the primary site, it can provide high availability at the Windows Server level if configured. At the user data level, the data is replicated to the secondary site and therefore is appropriate for DR purposes. Please note that Storage Replica doesn’t allow user access to data in the secondary site while replication is ongoing.
- Storage Spaces Direct (S2D): This builds on Windows’s Storage Spaces technology, which essentially provides software RAID capabilities to protect data. S2D enhances this by using multiple Windows File Server nodes, each with its own independent storage, to effectively create a vSAN. As such, this solution provides high availability at both the Windows Server and user data levels. However, it uses 2.5-4 times more disk space than a traditional file server (depending on your resiliency method) and may not be supported by your backup vendor. It’s noteworthy that Microsoft compares S2D as an alternative to Azure Files and Azure NetApp Files for storing FSLogix containers in Azure Virtual Desktop, as compared to the other Windows Server-based options.
Additional alternatives
What other options are left? There are two that come to mind.
The first alternative is a standalone Windows file server. It’s easy to get one up and running, doesn’t require specialized expertise or knowledge, and is probably the fastest solution to deploy. Unfortunately, it’s also a single point of failure. But it could be a viable option if we think outside the box. (More on this later.)
The next easiest alternative is DFS-R. Many customers are familiar with this technology since it’s been around for a long time. However, neither Microsoft nor Citrix support using DFS-R for profiles in an active/active scenario. So the best you can do here is deploy DFS-R in an active/passive manner. While this may meet the needs of some customers, DFS-R has drawbacks, such as its requirement for a file to be closed to replicate it and a manual failover process to remain in a supported configuration. This means when you failover, you cannot guarantee that data is up to date. For these reasons, we generally recommend against DFS-R.
Thinking outside the box
So far, we’ve discussed storage options for making the user profile highly available. But wait! There’s more. What about the technologies where high availability is built into the user profile solution rather than the file share solution?
FSLogix offers Cloud Cache, allowing for user data to be written to up to four locations. Please note that Cloud Cache has other design considerations for Citrix environments that are outside the scope of this article, so plan carefully. But FSLogix does provide an option nonetheless.
Additionally, as of CPM 2209, CPM containers now have the “Replicate user stores” option available as an HDX policy, allowing for replication of a user store to multiple paths on each logon/logoff. Related to this policy is the “Enable local caching for profile containers” as of CPM 2103, which will have similar design considerations to FSLogix’s Cloud Cache (also out of scope here).
Either of these technologies could be used in conjunction with a combination of the solutions mentioned above if desired. Finally, don’t forget that if your customer is entitled to Microsoft OneDrive for Business, Known Folder Move can be used to synchronize the Documents, Pictures, and Desktop folders to OneDrive. When used in conjunction with Files On-Demand, you can effectively remove the aforementioned folders from user profiles and reduce your storage requirements (and provide cloud-native HA!). Just remember that Files On-Demand requires Windows 10 1709+ or Windows Server 2019+ and that you should follow Microsoft’s recommendations for setting up OneDrive in Citrix.
Get started
In conclusion, lead with a PaaS or hardware-backed solution for resiliency. If that isn’t viable, consider S2D or Storage Replica on a Windows File Server (or SOFS if using a containerized solution). Moreover, FSLogix and CPM Containers also offer their own HA solutions. Finally, OneDrive can also be used to reduce overall storage requirements for user profiles. I’d love to hear about your field experience with any of these technologies – please reach out to share!
Ready to start planning your resilient CPM profile store? It’s worth noting that CPM includes a feature to migrate from FSLogix to CPM containers to make this process easier for customers who are interested in doing so. Additionally, since CPM is configured by policy it is straightforward to configure test policies for subsets of your users. Get started with CPM here.