Back in April 2018, I had the pleasure and privilege as an invited speaker to present at the CUGC Omaha Plains XL event about the experiences our working group accumulated with testing out the Raspberry Pi (RPI) as a thin client running Citrix Receiver.
Now, half a year later, a lot has changed, including the introduction of the Citrix Workspace hub, a number of enhancements in HDX, as well as newer firmware updates from Citrix, management enhancements from Stratodesk and the introduction of a second monitor module from both ViewSonic and NComputing. To top it all off, the Raspberry Pi Foundation also announced in March 2018 the RPI 3B+ model, which offers a number of significant improvements over the 3B.
We have been busy performing more tests, and this is good time to present a report on what we have experienced and learned in the meanwhile.
First of all, here is quick summary of the main points from the presentation I gave in Omaha:
- The Raspberry Pi is a promising Citrix Receiver-based thin client.
- Allocate adequate resource to VMs. We ended up going from 4.0 to 4.3 GB of RAM.
- Stick, if possible, with 32-bit web browsers (to keep memory allocation down).
- High-definition video (1080p or above) pushes the limits.
- Audio-video synchronization (audio lag) for large-format, high FPS videos is an issue.
- Analog audio output is of marginal quality, and you need to use USB or a USB-to-analog converter.
- Optimize the VMs as much as possible.
The conclusion at the time was that this is a very interesting prospective option. But in particular the issues with audio-video synchronization made it harder to consider for deployment where video delivery is important, such as in our education environment. It was felt, however, that this was not an insurmountable obstacle and our research continued. Note that in this investigation, we did not include any of the features of the Citrix Workspace hub as far as moving sessions between the RPI and an Android device are concerned; the focus was purely on the RPI performance as a standalone thin-client workstation.
More Tests with More Options
The previous results were encouraging enough to motivate us to keep working at this. Citrix was kind enough to provide us with Citrix Workspace hub units from both ViewSonic and NComputing, including a Pi Zero SDA. Stratodesk licenses were included. We also purchased a Sabrent USB to 3.5mm jack converter for around $8 because the 3.5 mm audio jack sound quality was not very good. In the hope that a Raspberry Pi 3B+ might improve performance, we also purchased the basic board, planning to see how compatible a swap might be with the stock 3B. Below is a picture of the various components we had to work with.
The test environment included XenServer 7.1 running on a Dell R720 with dual E5-2680 CPUs running at 2.80 GHz and a total of 40 hyperthreads (20 per physical CPU); XenDesktop 7.15; and a Windows 10 VM with 4 VCPUs and (ultimately) 4.3 GB of RAM. Various combinations of allocated CPUs (up to 6) and RAM (up to 5 GB) were tested, and this seemed to be the best compromise.
Various and primarily video media were tested because this provided the biggest challenge. Sources included mp4 files, YouTube videos of various formats, and Kaltura streaming videos. Performance monitoring and testing utilities included the native Windows performance meter and Task Manager, the third-party utility Regshot, as well as the third-party performance-monitoring utilities GPUprofiler and RDAnalyzer. The latter two were indispensable in the evaluation of the overall performance of the client.
One of the first undertakings was the investigation of how much better an RPI 3B+ might perform compared with a stock 3B. A “lobotomy”, of sorts, was performed on one of the units. Because of the retention of the same mounting hole spacing and approximate other dimensions, it turned out that the 3B+ fit perfectly well within the same case. Also, it runs with the same power supply that provides an output of 5V and 2.5 Amps. The 3B+ actually draws slightly less power overall than the 3B (1.13 vs. 1.34 Amps).
But would the same firmware and drivers work? With bated breath, I plugged everything together and powered up the unit. Within a short time, I was very pleasantly rewarded with a successful boot and login screen, which turned out to also work just fine with no modifications necessary to any settings. In fact, I was able to seamlessly switch between using a unit with the 3B and one with the 3B+ with the same VM, requiring no changes whatsoever to the VM itself. Huzzah!
Testing the 3B and 3B+
Equipped with these units, a series of tests were performed, and the results are presented in the table below. Each was pushing the limits of what the RPI was capable of. Tests included YouTube 1080p and 1080p60, as well as a Kaltura video, on both units and with both the single and dual-monitor setup. The same video was viewed to compare performance between the 3B and 3B+.
Surprisingly, there were only slight differences, which fell pretty much within a range that was not what you could consider statistically significant. The one exception is the available bandwidth. This is because the 3B+ has a different Ethernet chip (the LAN7515) that can support up to around three times the speed (around 300 Mbs) compared to what was attainable on the 3B (around 100 Mbs). This is ostensibly reflected in the table. The 1.4 GHz vs. 1.2 GHz clock speed should provide a bit more processing power in the 3B+, which the results at least hinted at. The sample times were not particularly long, so these values are more like snapshots of the overall performance.
Above all, we did not see any noticeable difference in the still present video-audio synchronization lag. Furthermore, we saw that going from firmware version 2.40.3190 to both 2.40.3512 and 2.40.3780 caused the 3.5 mm analog jack audio output to go from mediocre to unusable. Fortunately, a tweak to the settings was able to rectify this; more about that later.
The second display monitor unit did not seem to perform any differently with its own power supply vs. power being supplied to it through the same USB data connection.
Another issue that became evident is that when forcing EDT as the preferred HDX mode, there was a big slowdown in the video performance (about half the frame rate). Also on the second monitor, moving frames around was not only very sluggish in response, but remnants of the frame were seen as artifacts when moving frames about (this was limited to just the second monitor). Expanding a frame to fill the entire screen and back again cleared these ‘droppings,’ but the usefulness of the second monitor was majorly impaired.
Browsers and Their Impact
Let’s look at the impact of the Web browser. The illustration below shows tests with four leading browsers. Each was used to view the same video in three different formats (480p, 720p, and 1080p). The differences are quite apparent. Important is that Firefox was the one browser that had a 32-bit version installed; the rest were all 64-bit versions.
We discovered that with 64-bit browsers and just around 4 GB of memory allocated to the VM that in some cases, “memory creep” took place as the browser kept consuming more and more. Before long, the VM had no memory left, bringing it typically to a frozen state. Even increasing the VMs available virtual memory to 5 GB did not help.
What did help was to restrict usage to a 32-bit browser (at least when viewing large-format images) and compromising on a memory increase from 4.0 to 4.3 GB. Above all, the memory utilization dropped back down again when the video was halted — something that did not take place once the VM’s available memory was exhausted. The illustration below shows that, even when all four CPUs are running at near capacity and the memory is almost all taken up, the system is able to recover when the video is stopped.
The Winning Combination
Even with Web browser issues better understood, the audio-video synchronization issue still persisted. Things were about to get much better, though. There are dozens of parameters that can be altered, and the challenge is to find the right combination. We played around with various HDX settings, and thanks to the on-the-fly ability to modify some parameters with RDAnalyzer, hit upon what seems to be the best HDX setting for the RPI, namely “Default Mode (Thinwire H.264 + Lossless Text” and “Image quality drops but CPU on client is able to support higher resolution and improves performance”).
These have corresponding registry entries of:
[HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\Graphics]
“Encoder”=dword:00000001
“EncodeSpeed”=dword:00000002
This made a huge difference in overall performance and allowed even full-frame videos to run close to 30 FPS, even though the RPI was working hard to keep up.
The audio quality problem turned out to be fixed by a modification to the “timer-base scheduling” parameter. This fixed not only the poor sound quality with the analog jack, but also HDMI audio output quality issues. While the claim is that digital audio quality may suffer some with this setting, to me it was imperceptible.
Of great importance, it was determined after much experimentation that the audio-video synchronization issue turned out to be something addressable quite well with a parameter called Audio Latency Control. Changing it from its default value had a remarkably positive impact.
Below are the main changes that were made using the Stratodesk management console, resulting in what ended up being a configuration that worked as well as anything we’ve seen so far:
- To optimize the audio-video synchronization: Connections -> Image (“Campus” in our case) -> Citrix Workspace App (used to be “Citrix Receiver”) -> Audio Latency Control. Set to “on” (default is “no setting”).
- If using the 3.5mm (1/8″) audio jack or HDMI audio: Management -> Audio -> PulseAudio -> Timer-Base Scheduling. Set to “off” (default is “no setting”).
- Adaptive transport fallback setting: Connections -> Image (“Campus” in our case) -> Citrix Workspace App (used to be “Citrix Receiver”) -> Adaptive Transport (HDX over UDP/EDT). Set to “Preferred (fallback).”
Optional (might help):
- Flow control, which might help reduce the CPU load some: Connections -> Image (“Campus” in our case) -> Citrix Workspace App (used to be “Citrix Receiver”) -> Performance Flow Control. Set to “on” (default is “no setting”).
Those are the “silver bullets” that seem to result in a running configuration that works well in both a single as well as dual-monitor configuration, produces reasonable video and audio quality, and keeps things pretty stable, even when running a half dozen or so concurrent applications.
Below is a sample shot of the two-monitor set-up. Note the “stats for nerds” YouTube option in the upper left corner, which can be useful for monitoring connectivity quality.
With 1080p60 and higher resolutions, you run into issues with keeping up with those really high frame rates. Plus with any native format over 1920×1080 pixels, the RPI has to work very hard to reformat the video to fit its own constraints, and even with the frame rate restricted to 30 FPS, it’s still a struggle, as witnessed by the number of dropped frames from the 1080p video stats shown below.
With 4k video, frame rates drop to typically single digits and with significant pauses in the delivery kicking in quite frequently, even when dropping from a full-frame to a smaller display window.
Conclusions
The results of the additional investigations have been very encouraging. While not perfect in some ways, the RPI can be demonstrated to be capable of fulfilling the needs and requirements of certain environments, including even some with fairly high demands on video delivery. Areas where this configuration would be a viable option include student labs, kiosks, data-entry workstations, and print-release stations. Even for basic graphics processing, the units perform quite well, and even more so with a GPU back end.
Where issues arise include very high-speed or large format images or video (with resolutions above the RPI’s native 1920×1080 pixels), where more than two monitors are needed, and potentially where a large number of applications are all open and running concurrently (not yet sufficiently tested). As HDX continues to evolve, some of the video performance will continue to improve as will what newer generations of RPI units might bring.
The difference between the 3B and 3B+ was surprisingly little. Given that units containing the 3B+ are likely to become the standard commercial offering in the near future, holding out a bit longer for them may be worth it. They also run cooler and have some other interesting features not available on the 3B. While we have not run into an Ethernet bandwidth bottleneck, having extra capacity on the USB bus shared among the Ethernet and the four USB 2.0 jacks certainly cannot hurt.
The RPI as a Citrix Workspace hub may well be an option for those looking for a budget thin client with a robust management and deployment interface and a large number of parameters that can be customized. The journey continues.
Acknowledgements
First and foremost, I would like to thank my co-worker, Timothy Cochran, for his contributions and being involved in RPI investigations over the last several years. Additional help was provided by colleagues Kristopher Bullins and Keith Ballou. The support and encouragement by the folks on the Citrix Workspace hub and HDX teams and at Stratodesk, NComputing and ViewSonic are gratefully acknowledged.
Citrix TechBytes – Created by Citrix Experts, made for Citrix Technologists! Learn from passionate Citrix Experts and gain technical insights into the latest Citrix Technologies.
Click here for more TechBytes and subscribe.
Want specific TechBytes? Let us know! tech-content-feedback@citrix.com.