Direkt zum Hauptbereich

Linux Gaming Tweaks - A small guide to unlock more performance (1)

My personal journey to unlock more performance on Linux - Part 1: Introduction

This is the start of a new series dedicated to the Linux Gaming community. This is a bit of an oddball in my blog as most of my other blog posts are written for a German audience and cover my other two passions: politics and the law. Nonetheless, PC gaming is a hobby for me since I was six years old, playing games on a Schneider 386 SX. Wow, times ran fast. As I've learned quite a lot about Linux during the last couple of years, switching between several distributions, learning about compilers and optimizing parts of a Linux distribution for a greater gaming experience, I was asked recently on the Phoronix Forums to share some of my findings publicly, and I am very glad to do so with a global audience. But keep in mind, I am neither a software nor a hardware engineer - I am a law professional who is passionate about computers. I digged deep into the documentation and compiled a lot of code, breaking my system more than one time in the process. You have been warned, I might have made silly mistakes and still would not call me an expert by any means. Hence do not shy away to share your thoughts if you come across something which can be improved upon even further. That is how open source software development shows its strength, the wisdom of the crowd. Future posts will cover more technical topics. But the how-to-part is not the focus today with an exception for the unpatient folks who want to get most of the performance gains quickly.

To summarize my findings, I found that depending on the workload, there is potentially still lots of free performance to squeeze out of your system. And who doesn't like to get more juice out of the hardware which you already own? To give you a concrete example, I was able to improve the Company of Heroes 2's in-game benchmark from around 45 fps to 101 fps on the same hardware. That is more than double! For comparison, I get around 93 fps on an optimized Windows 11, hence the game does run signficantly faster on Linux now even though it is using DXVK and Proton which is a layer to translate parts of the Windows stack to Linux. Such major gains are unheard of in the hardware space nowadays. The potential gains you are going to see depend on your particular hardware, of course. Hence I need to talk a bit about mine in this opening piece. Second, I want to go into detail why I fell in love with EndeavourOS, which is based on Arch Linux. 

Hardware matters - My CPU and GPU choices

First, let's talk about hardware. Of course hardware matters, tinkerer's like me spend quite some time on chosing the right gear. As I am very old-fashioned and do not need the latest and greatest, I favor price-to-performance for my needs over everything else, so I went with a second-hand 12-Core Intel Xeon E5-2678 V3 (Haswell-EP) at first. It was first launched in 2014 but still brings plenty of performance to the table. I recently upgraded to a special 18-Core variant, the E5-2696 V3, to max out my system. You might ask why? Aren't Xeons for servers only? No, while they might not be the best choice for pure gaming systems, they come with a lot of cores and this particular model also has a clock frequency which is high enough for today's games and more than enough for older games that are not well multithreaded. Furthermore, while the single core performance is on par with a first generation AMD Ryzen CPU, the multi-core performance is not far away from a 16-Core first generation AMD Threadripper 1950X. You cannot beat that CPU for a system which sees a lot of compile jobs in terms of price-to-performance. And the more optimized your whole software stack gets, the less it matters that you have a frequency or slight IPC penalty. This is all on a Chinese X99 motherboard with 32 GB of DDR3-ECC memory. Of course nothing is left at stock on that system, the BIOS is heavily modified and not only makes use of a CPU hardware bug to unlock the turbo boost frequency on all cores, it also uses a stable undervolt of -55/-50/-50 mV to sustain a 3.2 Ghz all-core frequency under full load or around 3.4 - 3.6 Ghz in games. I also unlocked some hidden menues for further customizations. Of course there are people who might want to invest into newer gear, that's fine if you want to have something more modern. I can live with some sacrifices for targeting 1080P / 1440P gaming and get my fun out of optimizing the system.

The strength of that particular CPU is that it not only has plenty of cores and threads, but Haswell-EP also supports modern features and instruction sets, e.g. AVX2. Unfortunately most of these modern instructions are not widely used by default or by games, most distributions need to be target-agnostic hence they need to restrain themselves to the least common denominator. Some software is intelligent enough to detect the capabilities of the used CPU already and makes use of these newer instructions, but not all. That is therefore a whole area of optimization which we want to explore, because on Linux you have the source code available to compile that code for your particular CPU which can yield some performance benefits and power savings.

The lower clock frequency of the Xeon in comparison to today's latest and greatest is one weakness for gaming for sure, as gaming workloads tend to favor less cores with a high frequency. There are also a couple of limitations which affect the Haswell architecture and my modded setup in particular, e.g. the CPU energy management is more limited on Linux than on Windows, this might affect CPU wake-up latency negatively and has a potentially higher overall power draw than on Windows. This comes down due to the Turbo Boost hack described earlier which relies on an injected driver in the UEFI to exploit the CPU bug. I need to manually set certain relaxed C-State settings in the UEFI to get a stable system. There are also CPU errata which can cause some stuttering in games over a longer period of time (HSD131 or HSW131) due to various Machine Check Events, but this is just a minor inconvenience, temporary and goes away. You just need to be aware of it when looking at the logs and if you are as sensitive to latency as my trained eyes are.

Despite of these shortcomings, why is it the perfect CPU for me? As some of my software optimizations involve quite a lot of compiling source code, I need all the multi-core performance I can get to not sink too much time into getting the optimized binaries. Code compilation today is mostly a highly multi-threaded task, especially when using GCC's LTO or LLVM/Clangs's ThinLTO compiler option, the brand new linker Mold also helps tremendously in some situations. If you have a quad-core CPU or anything less, you might be better served with the quick and easy tweaks to get the most out of your invested time. I will come back to this soon.

The second important purchase decision is your graphics card (or GPU for short). I made a great deal with AMD's Vega 56 back in 2019 which still serves me well to this date. While not perfect, AMD has done a tremendous job over the past several years improving their open source driver support to match or exceed their Windows counterpart in terms of performance and stability. This means that AMD's GPUs fit very well into the open source ecosystem which shows especially with Valve sponsoring work on their Mesa graphics drivers (you might have heard of ACO which is a gaming-oriented graphics compiler used inside Mesa which yields a better gaming experience). While the performance is usually fine with Nvidia, you need to use their proprietary (closed-source) driver which doesn't fit too well into the open source ecosystem. Nvidia tried to push some unwanted standards down the stack in the past and met quite a lot of resistance to back down eventually. This means you could still see some annoying bugs in the graphics stack while using a Nvidia GPU, especially when new Kernels break their interfaces. As I haven't used an Nvidia GPU since Kepler on Linux, I cannot tell if there are still major issues on newer driver or Kernel releases. But you should be aware of that pitfall. A closed-source driver also means that you cannot compile it with more aggressive compiler flags, so it is one less opportunity for optimizations in a performance-sensitive area.

My choice for a distribution

But back to the software side. For a more performance oriented mindset there were two major developments which impacted the Linux ecosystem in recent times, the first one was Intel's Clear Linux distribution and the second remarkable innovation was the shaping of the x86-64 feature levels. With Clear Linux, Intel demonstrated how to squeeze out some additional performance out of the same hardware, with dramatic differences in some workloads. Unfortunately Intel shifted their focus to the Cloud and Internet-of-Things and have stepped back from their initial efforts to promote Clear Linux as a better Linux desktop alternative. Nevertheless, everyone can make use of their work, as they provide a Github repository with all of their patches and one can peek at their build instructions for each package (spec-file). This came in very handy for me to make use of some of their work.

The x86-64 feature levels are a means to make use of newer CPU instructions without alienating users of older CPUs too much. The feature levels are structured around the support of some CPU features, where v1 starts with the status quo, SSE4.2 is the baseline for x86-64-v2 while the next higher level requires AVX2. The highest level, x86-64-v4, is of no relevance today, as it mandates AVX-512. Without these feature levels, it became increasingly a problem to please everyone. The user base got entrenched, with users of vintage hardware complaining to be left out or forced to upgrade, if the minimum CPU requirements were to be raised, while other users with modern CPUs complained to be held back by vintage systems as they couldn't use the full potential of their hardware. While older CPUs won't profit in any way from a performance standpoint, these users now can keep using their old systems if their distro keeps supporting these. Users with newer CPUs on the other hand can unlock some of the performance which was left on the table before if the distribution offers one of the higher feature level builds. With the feature levels in place, distributions and developers can finally start to standardize on more modern x86 CPU features in a broader way. You only need to find a distro which offers such higher feature level builds today and this is where my distro choice comes into the limelight.

Of course there is more to a distro choice than just performance and it comes down to your personal preferences. I am not religious about my choice, I chose EndeavourOS for pragmatic reasons, as it is a tinkerer's dream and quite easy to setup and manage. It is also very close to upstream Arch which helps with compatibility with the x86-64-v3 user repository. Before that, I was using Manjaro for some time, but Manjaro carries older packages and is some steps away from upstream Arch, which can lead to stability problems when using packages from the Arch User Repository (AUR) or the x86-64-v3 user repository. In the recent past, I also used Kubuntu and openSUSE Tumbleweed for some time, but both are not as highly customizeable in an quick and easy way nor as performant but had their own benefits.

Some tricks which I will present later in this series can be easily adopted on other distributions, but some methods might not apply to these and you are on your own when trying to get similar results. This can involve more work on your end. For everyone who doesn't want to spend a lot of time to get some results, I have the good news that you can easily unlock some of the performance potential with EndeavourOS and the mentioned x86-64-v3 user repository for Arch Linux very quickly and easily. Just follow the instructions on their page and you are good to go. Maybe that is one of the reasons why EndeavourOS is now at second place on Distrowatch?! You will still need to compile a better Kernel yourself as the default Kernel is just awful, compiling a modified Kernel is the single most important part to unlock most of the performance gains. I will go into more on detail concerning my Kernel modifications in the next episode as it deserves its own article. If you cannot wait until then, try it out yourself, get the sources of the Xanmod Kernel for your system and learn everything you need to get it to run. You will be pleased with the results, and I will provide some tips in my next episode.

But back to my distro choice, EndeavourOS is a great foundation which can be build upon for more advanced tuning and is easier to use and manage than Gentoo (which is also known for its performance-oriented crowd and compile-everything-from-sources philosophy). What stands out is the AUR where you get recipes to build many software packages. The automatic dependency handling makes the process of compiling from sources easier, but you still need to be aware that these recipes are not perfect and sometimes need to modify them yourself for the best experience or to get them to compile at all. This means that for the majority of packages, you set your custom compiler flags once in /etc/makepkg.conf and can compile from source in a relatively easy way to squeeze more juice out of your software. Most major Linux distributions play it safe though, as they target the corporate market and the needs of enterprise and high-performance computing users (a good example for this is Fedora, which is by default the slowest distribution I came across, there is a new dedicated project to improve the Fedora gaming experience called Nobara Project). Stability and security are very important in that sector, performance is not at the forefront. This can conflict with low latency requirements which are important for gaming workloads. As a good chunk of money in the Linux sector is earned in that space, Linux gaming hasn't been much of a focus as the market share in the overall PC gaming sector is still dominated by Microsoft's Windows. Valve's Steam Deck or other initiatives might change this in the long term, but Linux gamers still must keep in mind that they are a scarce minority and treated as such. That means that they are not a first priority for hardware vendors, for the gaming software vendors and Linux distributions alike - at least not yet.

Unfortunately this leads to sacrifices, not every game runs well on Linux or at all. Make sure to check out ProtonDB if your favorite game works reasonably well on Linux before getting any high hopes for a better gaming experience. But generally-speaking, Linux gaming has improved quite a lot thanks to Valve's efforts with Steam and their work on various parts within the Linux ecosystem. There are still plenty of rough edges, bear that in mind if you haven't tried Linux before. You still need to tweak config files and have to deal with questionable default choices. But from my own experience, a fully-optimized Linux system provides a better gaming experience than Windows nowadays, at least on my hardware in my favorite games.

This was it for today, stay tuned for the next parts of this series where I want to write about compiler toolchains and how to squeeze out more performance out of the Linux Kernel.

Part 2: Tweaking the Linux Kernel

Part 3: Tweaking the Toolchain

Part 4: Compiling from Source

Beliebte Posts aus diesem Blog

Jetzt erst recht: Deutschland braucht moderne Atomkraftwerke

Ein schwarzer Tag für Deutschland: An diesem Tag werden die letzten Kernreaktoren der 2. Generation abgeschaltet. Es ist ein viel beachteter Moment, der gemischte Reaktionen hervorruft. Während die Anti-AKW-Bewegung seit den 70er-Jahren auf diesen Tag hingearbeitet hat und jubelt , betonen andere, zu denen ich gehöre , die 300 Mrd. kWh CO2-armen und günstigen Strom, die sie im Laufe ihrer vielen Jahrzehnte in Deutschland produziert haben und hielten es für vernünftiger, wären wir heute aus der Kohlekraft ausgestiegen und behielten die Kernenergie um mindestens zwei Dekaden weiter und nicht umgekehrt. Für sie bedeutet dieser Tag einen zivilisatorischen Rückschritt zu Lasten des Landes. Die Grundlast wird von nun an entweder durch importierten Strom aus dem Ausland, oder eben von Gas und Kohle bereit gestellt werden müssen, die deutlich mehr CO2 ausstoßen. Und aufgrund des Ukraine-Krieges war insbesondere der Bezug von Gas zuletzt ein teures Unterfangen, das die Bürger mit signifikanten ...

Amtsschimmel - Folge 4 (Fortsetzung 3) - Die Generalstaatsanwaltschaft steckt den Kopf in den Sand

Wenn es um das Sühnen staatlichen Unrechts geht, ist in der Regel auf eines Verlass: Auf eine groteske Verweigerungshaltung anderer staatlicher Stellen dies anzuerkennen und in der Folge auch zu ahnden. Wer den Ausgangsfall verpasst hat, sollte unbedingt sich zuvor den Beitrag hier noch einmal anschauen. Widmen wir uns heute dem Bescheid der Generalstaatsanwaltschaft Rostock vom 10. Januar 2024 (Az.: 2 Zs 724/23), der inhaltlich bedauerlicherweise wieder einer Arbeitsverweigerung gleich kommt. Immerhin stellt man sich dabei leicht intelligenter an als  noch die Staatsanwaltschaft Schwerin , wenn auch im Ergebnis ohne Substanz: Lieber Kollege Henkelmann , haben Sie wirklich über die Jahre alles vergessen, was Sie einmal im Staatsrecht gehört haben sollten? So grundlegende Dinge, wie die Bindung aller staatlicher Gewalt an die Grundrechte (Art. 1 Abs. 3 GG) oder das Rechtsstaatsprinzip (Art. 20 Abs. 3 GG)?! Fühlen Sie sich auch noch gut dabei, wenn Sie tatkräftig dabei mithelfen, da...