My personal journey to unlock more performance on Linux - Part 3: Tweaking the Toolchain
I am glad that you came by again to read the third part of my Linux Gaming Tweaks series. If you missed the first part, head over here to get a general overview and learn more about my hardware and Linux distribution choice or visit the second part where I cover the Linux Kernel. In this episode, it is time to dive deep into compiler toolchains and everything related to them. As we got the source code on Linux for many programs and libraries, even the compilers themselves, we can try to be more clever (or more risk tolerant) than the programmer or package maintainer to use more aggressive compiler flags to squeeze more performance out of the given source code. That is something we explore today.
A deep dive into unknown territory - the C/C++ Toolchain
On Linux, there are basically two different C/C++ compiler toolchains of importance for us, GCC and LLVM/Clang. Both can be used to compile source code into executable binary programs. Think of both as the neccessary tools to build your favorite programs or libraries which you use daily. Both come with their own set of related tools to accomplish that job. They can also be tweaked to get better performing programs out of them when configured with additional options (via compiler flags) that are used to tell the compiler which optimization level or which other clever things it should perform on the source code.
Both toolchains have their own strengths and weaknesses, GCC is the most compatible as it is the default compiler on Linux since its inception. LLVM/Clang on the other hand is a relatively young project with a more modern code base and where more companies are engaged nowadays due to its more permissive license. Even though I am a law professional, I want skip the licensing discussion alltogether as you are probably only interested in performance tweaks and don't care about the history of both compilers. But to know a little about their background provides you some insights why certain things are simply different in both worlds which also has implications for your usage of them.
As a novice, I had the naive expectation that both compiler toolchains are perfectly interchangeable, after all they are both serving the same purpose and produce ABI-compatible binaries, but while that is true to some degree, I was very wrong to assume that both would be just as compatible to every code base I throw at them. In a nutshell, C/C++ as a programming language is a very complex system with many deficiencies and each compiler might be free to implement certain features in a different way that programmers might or might not abuse or rely on in their code base. The result is that only one compiler toolchain is able to compile their program. And while many programs are compiler agnostic, there are some that are not, e.g. the Linux Kernel only gained support for LLVM/Clang recently. That means that you have probably in general more luck when using GCC as your standard compiler on Linux, but I also came across various projects that are LLVM/Clang-exclusive. Just keep that problem in mind, as that might be a reason that the build process breaks. Another reason for build breakage is that more aggressive compiler flags are not the focus of testing on the compiler and the software side. Consider yourself warned by now that not everything works out-of-the-box as it should in an ideal world. This is especially true on experimental toolchain builds which are still in development.
Build your own C/C++ Toolchain from Source
Tweaking and building both toolchains from source is a great endeavour to see what can be achieved with some effort, but it is also a very tedious process as they both not only take a very long time to compile even on a fast CPU with many cores, but are also a fundamental part of your system where you do not want to see any errors. These errors might render your system unusable. Most people are probably better off using the pre-built binaries from your distribution as a starting point to tinker around with. Of all the tweaks I wrote about in this series, building your own toolchain from source is the hardest and most time-consuming task. On the other hand, it provides you not only better performing binaries but also faster compilation times, it is also a great learning experience. Hence if you have plenty of time to sink into it and if you are eager to compile a lot of code, you should try it out.
Interested Arch users might want to take a look at my Github repository. The toolchain folder contains everything you need to build an optimized GCC and LLVM toolchain on Arch-based distributions. I also mentioned some instructions and my build flags for each package to easily reproduce my procedure. I also took the liberty to alter these packages to my liking (e.g. language support and some subprojects got axed, but the packages should also work for most standard users). You can either take these experimental PKGBUILDS as is or customize them, or you can take them as a source of inspiration and edit the official PKGBUILDS with some of my suggestions if you want to try out some ideas yourself on a productive system. While I did some research on my changes and took some inspiration from Clear Linux and Allen McRae's alternative GCC toolchain, I know that these packages work for me and my purposes only, your mileage my vary. You also need to take a look at the PKGBUILD anyways, to alter the paths for the applied patches or to adjust some settings for your CPU. As I cut some corners regarding the error checks to save a lot of time, you should either use known-working compiler flags to play it safe or run these checks to verify that your toolchain works as expected.
Tweaks for GCC
For building GCC, the build order is important: linux-api-headers > glibc > binutils > gcc > glibc > binutils > gcc
With that out of the way, let's take a look at my customized GCC PKGBUILD. You will notice that I added three performance-sensitive patches from Clear Linux and altered some options, the important ones are:
--with-tune=haswell \
--with-arch=haswell \
--with-glibc-version=2.35 \
--with-build-config=bootstrap-lto \
The first two tell GCC to optimize for Intel's Haswell CPU architecture which is useful if you are planning to use the toolchain with that particular CPU or on a compatible architecture. This is also specified for the stage 1 compiler later in the file (during the GCC build process several compilers are build and thrown away, hence only modest compiler flags are used for the first stage to speed up the process). The Glibc entry is specified to tailor GCC to this specific Glibc version.
The last option means that the "bootstrap-lto" build script is used and means that several stages of the compiler are build, with some parts using Link Time Optimizations (LTO). As I also specified the make target "profiledbootstrap" later in the file, this means that profile-guided optimizations (PGO) get also used. Be aware that with these options the build process takes a lot of time to finish, even more so when including the checks.
Another major change was to add the following two CXXFLAGS (for C++ code as CFLAGS is for C code):
BOOT_CXXFLAGS="$CFLAGS" \
CXXFLAGS_FOR_TARGET="$CXXFLAGS" \
The idea for this came when analyzing both the Clear Linux spec file and Allen McRae's PKGBUILD. It seems to me that on Arch Linux these entries were missed in the original PKGBUILD, potentially using less-optimized default values. While that might not matter much for the original distribution package, it might matter for us as we typically use more aggressive compiler options than the default and want to apply these on C++ code within GCC, too.
Tweaks for Glibc
The second important package which I want to discuss here is Glibc, the standard C library. This package is a dependency for most other software you use and the default library on almost all relevant Linux distributions. It always should be build as part of a new GCC build. This is also a performance- and security-sensitive package at the core of your system. As you see from my Github repository, I carry a few of Clear Linux' patches on top and in the PKGBUILD you will find
--disable-cet \
--enable-kernel=5.17 \
The first option disables a security feature (which I also disabled in GCC and Binutils which you might want to enable), the second specifies the Linux Kernel version to make use of newer Glibc features. Be sure to use more modest CFLAGS here, just look at the front page of my Github repo to find the ones I use.
Tweaks for LLVM/Clang
LLVM/Clang is by far an easier beast to tame. Unlike GCC, you don't need to build several dependant packages in a specific order to get it to work. It uses CMAKE as a more modern build system and the process is fairly easy to master. The only major downside is that the default PKGBUILD of Arch needed more surgery to get a decent build for my taste. If you analyze my PKGBUILD with the default you will see what I mean. Also be aware that some packages of your installation are dependent on a specific LLVM version, hence if you build LLVM-git, you need to build that dependent package to link to the new LLVM version, too. This can also be the source of some trouble. The graphics project Mesa is the most prominent example as I noticed that a new LLVM version can cause issues or build breakage when compiling Mesa. Usually such issues are fixed or worked around in a couple of days, but you still need to consider this fallout as riding LLVM-git might break your system elsewhere sooner or later.
The default PKGBUILD carries a lot of baggage around which at least I don't find useful for pure C/C++ compilation on a standard X86 system. If you have other needs, you can alter it to your liking, but I like it to be as slim as possible as every other subproject costs additional compilation time or might be a source for build breakage.
I also carry some patches around for LLVM, noteworthy is the "haswell.patch" which you should alter with your CPU architecture if you want to optimize it for your own system - this one was also taken from Clear Linux, just like the rest, but altered by me as they default to Westmere.
This was it for today, if all went well you should now have a highly optimized LLVM and GCC toolchain on your PC with much better build times as before. In the next episode I want to talk about KWinFT, a drop-in-replacement for the default window manager on KDE Plasma and possibly a couple of other projects which are worth it to compile from sources. As this post is already quite big, I will get to explain my choice of CFLAGS early on in the next episode.