Direkt zum Hauptbereich

Linux Gaming Tweaks - A small guide to unlock more performance (2)

My personal journey to unlock more performance on Linux - Part 2: Tweaking the Linux Kernel

Welcome back to the second part of my Linux Gaming Tweaks series. If you missed the first part, head over here to get a general overview and learn more about my hardware and Linux distribution choices. In this episode, I will cover the single most important item on my tuning list, tweaking the Linux Kernel. Hence I will talk today about the Xanmod Kernel, additional patches I carry around to unlock an even better gaming experience, tweaks to my Kernel configuration, my Kernel command line and the compiler flags which I use to compile my Kernel.

Unlike Windows, the Linux Kernel itself contains almost all of your hardware drivers (with notable exceptions, e.g. Nvidia's GPU driver). Hardware drivers are fundamental to get your PC up and running, changes in this area are also very performance-sensitive, beware that some tweaks might have an effect to the stability and security or even power usage of your system. Just keep that in mind and do your due diligence before you end up with a non-booting or non-functional system, hence having a known-safe backup Kernel is highly recommended!

Use the Xanmod Kernel sources

The first tweak is quick and easy: Just take the Xanmod Kernel sources as a base for further tuning. Xanmod already carries some optimizations and additional patches for a better gaming experience. For more details on their changes, just take a look at their website. Does it make a difference? From my own testing I can say: Yes, it does. There are a couple of benchmarks on Phoronix proving that with hard numbers, too. The good news for everyone on Ubuntu-based systems is that you can get already a better-than-default experience with the Xanmod binaries offered via their website or via Github. There is still some room for further improvements though, but if you don't want to invest too much of your time, just install these binaries on a compatible system and enjoy some of the benefits.

Customize the Xanmod Kernel with adding some additional patches

There are a couple of patchsets floating around that make Xanmod even better. Which ones? Just take a look at my Github repository. A couple of comments:

- ProjectC: ProjectC is an alternative CPU scheduler optimized for desktop/gaming workloads, it is even better if you change the NICE factor to the lowest possible for each game.exe entry via "htop"

- RCU-changes from Zen-Kernel: These patches alone gave me a 5% performance gaming improvement

- Speculative Page Faults: These patches are already carried by several Android vendors and provide a faster application startup performance 

- low-latency patches: provide a better low-latency experience which is important for smoothness felt on the desktop and in games

- the X86.patch file contains several different X86 patches from the Linux Kernel Mailing List which were reported to have a positive performance effect

Tweaking the Kernel configuration: Slim it down to what you need and throw out the obsolete

But let's not stop there, you also should tweak your Kernel configuration to unlock an even better experience. I have to admit that configuring the Kernel is tedious work as it is a time consuming endeavour, especially if you want to understand what you do, this needs research into what each option means and what impact a change might have. If you don't want to invest that work, that's fine - just use the defaults.

I also assume that you already know about how to compile the Kernel from sources, handling all the build dependencies et cetera, if you don't, just consult the documentation of your distribution as the methods to build the Kernel can vary a lot between different distributions. While I cannot guarantee that the Xanmod sources work everywhere and they do not support officially everything else than Ubuntu and its derivatives, I've tested them on Fedora and also openSUSE myself which means chances are high that you can get most of the benefits also there.

On Arch-based distributions, "sudo pacman -S base-devel" should be enough to get the GCC compiler and some other tools needed for setting up a build environment, the rest of the Kernel-specific dependencies is handled by "makepkg" (you could also use alternatives, e.g. "yay", but haven't tried them yet). Just download the Xanmod Kernel PKGBUILD over at the AUR, and edit it to your liking. Just make sure to also set "makenconfig=y" in the PKGBUILD to enable a simple Kernel configuration menue which lets you edit the Kernel configuration. Make sure to set your CPU architecture in the PKGBUILD and also use the same compiler in your /etc/makepkg.conf file before building the Kernel, more on the specifics on compiler flags later. (Tip: Even though the linux-xanmod-edge package doesn't use its CFLAGS from the makepkg.conf for the build, I have seen problems when the compiler used for the Linux build did not match the one specified in the makepkg.conf)

After editing the PKGBUILD you start the build process by "makepkg -si --cleanbuild --skippgpcheck --skipchecksums". This command will install all of the needed dependencies and skips some unneeded checks. You should now see a very basic user interface with several menues, this is the place where you alter the Kernel configuration. Make sure to save it after your changes and also make sure to copy the hidden ".config" file which you find within /linux-xanmod-edge/src/linux* to the main directory (where the PKGBUILD file is located) and rename it to "myconfig". This way you will keep the altered config the next time you compile the Kernel, otherwise it defaults to a standard config and you have to start over the whole process again.

What alterations should you do? For beginners, I suggest you follow the Kernel configuration guide from Odi. While I would challenge a couple of his choices, it is in general a very good guide and a great start. As every PC is different, you are on your own when it comes to the details. But the mission is simple: Slim the Kernel down to what you really need for your own hardware and throw out the obsolete or unneeded. That also means that you probably won't need features which are important for servers only. But don't throw out too much if you need some flexibility, e.g. if you want to deploy your Kernel config on more PCs with differnt hardware components. In general, slimming down the Kernel not only provides you with a much smaller Kernel size and saves you some build time, it also improves security by lowering the attack surface and also helps to avoid potential problems from using more aggressive compiler flags.

If you want to take some inspiration of my Kernel config, which is heavily stripped down, just look at the linux-xanmod-edge directory on my Github repository. In short, I favor performance over security, hence I also disabled much of the security features. People might have different opinions here, that is certainly something up for debate. My config will also most probably not work for you if you are using different hardware than mine, as I also heavily cut down hardware support. Hence I cannot recommend to use my config at all, at least not blindly. Just take the default Xanmod config and tweak it to your liking.

Use more aggressive compiler flags

The final area of optimization I want to talk about today is the use of more aggressive compiler flags to get a more optimized Kernel build. By default, the Kernel uses very modest compiler flags. I found it to be save to be a bit more aggressive. The rational behind some of the flags used is to optimize for cache locality (using Graphite for GCC and Polly for LLVM/Clang). I also get rid of some security features as they produce more inefficient code.

With the method used on Arch-based Linux, make sure to be still in the configuration screen, that gives you the possibility to edit the top level Makefile and also arch/X86/Makefile, both are erased automatically on every new Kernel build. Hence you need to make the following alterations each time you build a new Kernel (if you are more clever than me, you might be able to produce a patch to do that automatically).

For the top-level Makefile and GCC:

(Tip: If you are using a modern text editor, just use Strg+F to find KBUILD_USERCFLAGS within the file.)

export KBUILD_USERCFLAGS := -Wall -Wmissing-prototypes -Wstrict-prototypes \

-O3 -mtune=native -march=native -fno-semantic-interposition -falign-functions=32 -fipa-pta -flive-range-shrinkage -fno-math-errno -fno-trapping-math -mtls-dialect=gnu2 -feliminate-unused-debug-types -floop-nest-optimize -fgraphite-identity -fcf-protection=none -fdevirtualize-at-ltrans -mharden-sls=none -std=gnu18

export KBUILD_USERLDFLAGS := -Wl,-O3,-Bsymbolic-functions,--as-needed

KBUILD_HOSTCXXFLAGS := -O3 -mtune=native -march=native -fno-semantic-interposition -falign-functions=32 -fipa-pta -flive-range-shrinkage -fno-math-errno -fno-trapping-math -mtls-dialect=gnu2 -feliminate-unused-debug-types -floop-nest-optimize -fgraphite-identity -fcf-protection=none -fdevirtualize-at-ltrans -mharden-sls=none $(HOST_LFS_CFLAGS) $(HOSTCXXFLAGS)

KBUILD_HOSTLDFLAGS  := -Wl,-O3,-Bsymbolic-functions,--as-needed $(HOST_LFS_LDFLAGS) $(HOSTLDFLAGS)

KBUILD_HOSTLDLIBS   := -Wl,-O3,-Bsymbolic-functions,--as-needed $(HOST_LFS_LIBS) $(HOSTLDLIBS)

CFLAGS_MODULE   = -O3 -mtune=native -march=native -fno-semantic-interposition -falign-functions=32 -fipa-pta -flive-range-shrinkage -fno-math-errno -fno-trapping-math -mtls-dialect=gnu2 -feliminate-unused-debug-types -floop-nest-optimize -fgraphite-identity -fcf-protection=none -fdevirtualize-at-ltrans -mharden-sls=none -Wl,-O3,-Bsymbolic-functions,--as-needed

CFLAGS_KERNEL = -mtune=native -march=native -fno-semantic-interposition -falign-functions=32 -fipa-pta -flive-range-shrinkage -fno-math-errno -fno-trapping-math -mtls-dialect=gnu2 -feliminate-unused-debug-types -floop-nest-optimize -fgraphite-identity -fcf-protection=none -fdevirtualize-at-ltrans -mharden-sls=none -Wl,-O3,-Bsymbolic-functions,--as-needed

For the top-level Makefile and a custom LLVM/Clang with Polly:

export KBUILD_USERCFLAGS := -Wall -Wmissing-prototypes -Wstrict-prototypes \

-O3 -march=native -mtune=native -mllvm -polly -mllvm -polly-parallel -fopenmp -mllvm -polly-vectorizer=stripmine -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=36 -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1 -mllvm -polly-ast-use-context -mllvm -polly-invariant-load-hoisting -mllvm -polly-loopfusion-greedy -mllvm -polly-run-inliner -mllvm -polly-run-dce -fno-math-errno -fno-trapping-math -falign-functions=32 -fno-semantic-interposition -fcf-protection=none -std=gnu18

export KBUILD_USERLDFLAGS := -Wl,-O3,-Bsymbolic-functions,--as-needed

KBUILD_HOSTCXXFLAGS := -O3 -march=native -mtune=native -mllvm -polly -mllvm -polly-parallel -fopenmp -mllvm -polly-vectorizer=stripmine -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=36 -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1 -mllvm -polly-ast-use-context -mllvm -polly-invariant-load-hoisting -mllvm -polly-loopfusion-greedy -mllvm -polly-run-inliner -mllvm -polly-run-dce -fno-math-errno -fno-trapping-math -falign-functions=32 -fno-semantic-interposition -fcf-protection=none $(HOST_LFS_CFLAGS) $(HOSTCXXFLAGS)

KBUILD_HOSTLDFLAGS  := -Wl,-O3,-Bsymbolic-functions,--as-needed $(HOST_LFS_LDFLAGS) $(HOSTLDFLAGS)

KBUILD_HOSTLDLIBS   := -Wl,-O3,-Bsymbolic-functions,--as-needed $(HOST_LFS_LIBS) $(HOSTLDLIBS)

CFLAGS_MODULE   = -O3 -march=native -mtune=native -mllvm -polly -mllvm -polly-vectorizer=stripmine -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1 -mllvm -polly-ast-use-context -mllvm -polly-invariant-load-hoisting -mllvm -polly-loopfusion-greedy -mllvm -polly-run-inliner -mllvm -polly-run-dce -fno-math-errno -fno-trapping-math -falign-functions=32 -fno-semantic-interposition -fcf-protection=none -Wl,-O3,-Bsymbolic-functions,--as-needed

CFLAGS_KERNEL = -march=native -mtune=native -mllvm -polly -mllvm -polly-parallel -fopenmp -mllvm -polly-vectorizer=stripmine -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=36 -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1 -mllvm -polly-ast-use-context -mllvm -polly-invariant-load-hoisting -mllvm -polly-loopfusion-greedy -mllvm -polly-run-inliner -mllvm -polly-run-dce -fno-math-errno -fno-trapping-math -falign-functions=32 -fno-semantic-interposition -fcf-protection=none -Wl,-O3,-Bsymbolic-functions,--as-needed

For the arch/X86/Makefile (for both GCC and LLVM/Clang):

Delete the following entries early on within the file to make sure to get rid of Retpoline:

ifdef CONFIG_CC_IS_GCC

RETPOLINE_CFLAGS := $(call cc-option,-mindirect-branch=thunk-extern -mindirect-branch-register)

RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch-cs-prefix)

RETPOLINE_VDSO_CFLAGS := $(call cc-option,-mindirect-branch=thunk-inline -mindirect-branch-register)

endif

ifdef CONFIG_CC_IS_CLANG

RETPOLINE_CFLAGS := -mretpoline-external-thunk

RETPOLINE_VDSO_CFLAGS := -mretpoline

endif

export RETPOLINE_CFLAGS

export RETPOLINE_VDSO_CFLAGS

My command line

You can alter the Kernel behavior by modifying the corresponding entry in /etc/default/grub - just remember to run "sudo grub-mkconfig -o /boot/grub/grub.cfg" to update grub with the new entries afterwards.

This is what I currently use: "nowatchdog nvme_load=YES amdgpu.noretry=1 amdgpu.deep_color=1 amdgpu.audio=0 amdgpu.mes=1 amdgpu.mes_kiq=1 quiet mitigations=off noibrs noibpb nopti nospectre_v2 nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off mmio_stale_data=off pcie_aspm=off pcie_acs_override=downstream amdgpu.ppfeaturemask=0xffffffff udev.log_priority=3 intel_iommu=on mce=off nohz_full=2-35 isolcpus=0,1 irqaffinity=0,1 default_hugepagesz=2M hugepagesz=1G hugepages=4 hugepagesz=2M hugepages=1024"

The amdgpu related entries are for my GPU, they enable certain desireable features which are off by default. "mitigations=off" disables the Spectre/Meltdown-mitigations and re-gains some performance. The same goes for all the other security flaws. I also enable the Intel IOMMU and run with different hugepagesizes which allows for some benefits in memory management. While they are registered in the system, I have to admit that I am still unsure if more configuration is needed to make use of them or if they are used by default now. As I also use Transparent Huge Pages, it is currently not known to me if having both of these together bring a benefit or if Hugepages became obsolete. I assume the former. The ASPM-entry is just a workaround for a quirky Intel network card, you shouldn't need it. The same goes for "mce=off" - as my Haswell-EP spams the log with Machine Check Errors due to a CPU bug otherwise. One other intersting topic is the NO_HZ_FULL timer, this is still an experimental Kernel feature which I use, you can read more about that here. The basic idea is to use one core for housekeeping of all CPUs and get better latency on these. That works surprisingly well for gaming, as that is a latency-sensitive workload.

Closing Remarks

We have explored several different areas of Kernel tuning today and this post describes all the details of my personal Kernel modifications. Maybe you found some new ideas to play around with?! That would be great! At least that was the goal of this post. These Kernel tweaks yielded an increase from 36 fps (avg) to 100 fps (avg) in the in-game-benchmark of Company of Heroes 2 (for reference, I get around 93 fps on an optimized Windows 11). I got there also with a customized toolchain which might have been an additional source of the improvements, I will dig deeper into optimizing the GCC and LLVM toolchain in the next episode. I'd like to thank all of the engineers for their hard work and also the volunteers who help to make Linux even greater.

Part 1: Introduction

Part 3: Tweaking the Toolchain

Part 4: Compiling from Source

Beliebte Posts aus diesem Blog

Jetzt erst recht: Deutschland braucht moderne Atomkraftwerke

Ein schwarzer Tag für Deutschland: An diesem Tag werden die letzten Kernreaktoren der 2. Generation abgeschaltet. Es ist ein viel beachteter Moment, der gemischte Reaktionen hervorruft. Während die Anti-AKW-Bewegung seit den 70er-Jahren auf diesen Tag hingearbeitet hat und jubelt , betonen andere, zu denen ich gehöre , die 300 Mrd. kWh CO2-armen und günstigen Strom, die sie im Laufe ihrer vielen Jahrzehnte in Deutschland produziert haben und hielten es für vernünftiger, wären wir heute aus der Kohlekraft ausgestiegen und behielten die Kernenergie um mindestens zwei Dekaden weiter und nicht umgekehrt. Für sie bedeutet dieser Tag einen zivilisatorischen Rückschritt zu Lasten des Landes. Die Grundlast wird von nun an entweder durch importierten Strom aus dem Ausland, oder eben von Gas und Kohle bereit gestellt werden müssen, die deutlich mehr CO2 ausstoßen. Und aufgrund des Ukraine-Krieges war insbesondere der Bezug von Gas zuletzt ein teures Unterfangen, das die Bürger mit signifikanten

Amtsschimmel - Folge 4 (Fortsetzung 3) - Die Generalstaatsanwaltschaft steckt den Kopf in den Sand

Wenn es um das Sühnen staatlichen Unrechts geht, ist in der Regel auf eines Verlass: Auf eine groteske Verweigerungshaltung anderer staatlicher Stellen dies anzuerkennen und in der Folge auch zu ahnden. Wer den Ausgangsfall verpasst hat, sollte unbedingt sich zuvor den Beitrag hier noch einmal anschauen. Widmen wir uns heute dem Bescheid der Generalstaatsanwaltschaft Rostock vom 10. Januar 2024 (Az.: 2 Zs 724/23), der inhaltlich bedauerlicherweise wieder einer Arbeitsverweigerung gleich kommt. Immerhin stellt man sich dabei leicht intelligenter an als  noch die Staatsanwaltschaft Schwerin , wenn auch im Ergebnis ohne Substanz: Lieber Kollege Henkelmann , haben Sie wirklich über die Jahre alles vergessen, was Sie einmal im Staatsrecht gehört haben sollten? So grundlegende Dinge, wie die Bindung aller staatlicher Gewalt an die Grundrechte (Art. 1 Abs. 3 GG) oder das Rechtsstaatsprinzip (Art. 20 Abs. 3 GG)?! Fühlen Sie sich auch noch gut dabei, wenn Sie tatkräftig dabei mithelfen, dass ü

Linux Gaming Tweaks - A small guide to unlock more performance (1)

My personal journey to unlock more performance on Linux - Part 1: Introduction This is the start of a new series dedicated to the Linux Gaming community. This is a bit of an oddball in my blog as most of my other blog posts are written for a German audience and cover my other two passions: politics and the law. Nonetheless, PC gaming is a hobby for me since I was six years old, playing games on a Schneider 386 SX. Wow, times ran fast. As I've learned quite a lot about Linux during the last couple of years, switching between several distributions, learning about compilers and optimizing parts of a Linux distribution for a greater gaming experience, I was asked recently on the Phoronix Forums to share some of my findings publicly, and I am very glad to do so with a global audience. But keep in mind, I am neither a software nor a hardware engineer - I am a law professional who is passionate about computers. I digged deep into the documentation and compiled a lot of code, breaking my s