Last modified: 2024-01-21 09:34
When you run a program that was built for a later generation or different flavor of x86 CPU than the one you have, eventually the CPU hits an instruction that it doesn't know. At that point it raises the invalid opcode hardware exception. The Linux kernel handles this exception by delivering the illegal instruction signal to the offending process. A command line user sees an "illegal instruction" error message and the program crashes. On a supposedly user-friendly desktop, error messages go into a black hole and the user sees only the unexplained crash.
Although the examples given in this page are for older generation hardware, it is just as easy to get illegal instruction on a 64-bit CPU as on a 32-bit one. You just fail on a different instruction set—at the moment, probably some fragment of AVX512.
Figure 1 shows significant instruction sets added to x86 CPUs in the range from MMX to SSE2. The names of instruction sets are shown plainly while the names of CPU types are shown in parenthesis. The instruction sets supported by a CPU type include all of those named on a path upward to the top node in the lattice. So, Pentium (at the top) supports none of the added instruction sets while Athlon 64 (at the bottom) supports all of them.
i586, i686, P2, P3, and P4 are the commonly used nicknames for Pentium, Pentium Pro, Pentium 2, Pentium 3, and Pentium 4 CPUs respectively.
The -march
switch enables added instruction sets like SSE and
3DNow! consistent with the CPU type. Added instruction sets can also be
enabled individually with switches like -msse
. Tables 1 and 2
summarize
GCC
12.2 documentation for the MMX to SSE2 range.
As a general rule, when the GCC command line contains contradictory
switches, the later switch takes precedence. My expectation, then, was
that -msse -march=i586
would be equivalent to
just -march=i586
and would not generate SSE code. Instead, GCC
merges the i586 instruction set with the SSE instruction set and generates
code that won't even run on a Pentium 2 or Athlon, much less a Pentium.
CPU | GCC option |
---|---|
Pentium | -march=i586 or -march=pentium |
Pentium MMX | -march=pentium-mmx |
Pentium Pro | -march=i686 or -march=pentiumpro |
P2 Deschutes | -march=pentium2 |
Pentium 3 | -march=pentium3 |
Pentium 4 | -march=pentium4 |
K6 | -march=k6 |
K6-2 | -march=k6-2 |
K6-2+ | None |
Athlon | -march=athlon |
Athlon XP | -march=athlon-xp |
Athlon 64 | -march=k8 or -march=athlon64 |
Instruction set | GCC option |
---|---|
MMX | -mmmx |
CMOV | None (determined by -march ) |
FXSR | -mfxsr |
Extended MMX | Subset of -msse and -m3dnowa * |
SSE | -msse |
SSE2 | -msse2 |
3DNow! | -m3dnow |
Extended 3DNow! | Subset of -m3dnowa * |
* The 'a' in -m3dnowa
presumably stands for
Athlon. -m3dnowa
enables the 5 instructions of Extended 3DNow!
plus the 19 instructions of Extended MMX. Extended MMX is also included in
SSE.
Most packages in the 32-bit version of Slackware 15.0 are built with
-march=i586
and specify i586 in the package name. A few are
built for i686 instead. But the upstream configuration and build scripts
for many packages add -msse
, -msse2
or suchlike
to the GCC command line. Slackware's SlackBuild scripts don't remove or
cancel out those switches, so packages that are nominally built for
the i586 arch actually require a Pentium 3 or Pentium 4 to run.
Some packages furthermore include SSE or SSE2 assembly code. To produce an i586-compatible build for these, it's necessary to use package-specific configuration options to disable SSE and/or asm code, and Slackware's SlackBuild scripts don't do that.
Finally, Rust has a unique build system with a unique problem resulting from somebody's decision to enable SSE2 in builds targeting i686. The release notes for Rust version 1.10.0 (2016-07-07) say "This release includes std binaries for the i586-unknown-linux-gnu, i686-unknown-linux-musl, and armv7-linux-androideabi targets. The i586 target is for old x86 hardware without SSE2...." Since Slackware uses the i686 target for Rust, not only does rustc itself fail to run on the target architecture, every other package containing Rust code becomes contaminated with SSE2 as a second-order effect.
Following is a summary of the remediations I had to make for Slackware 15.0 to be fit for purpose on an Athlon T-Bird CPU. Since there are some packages that I never use and some that I always replace with my own build, this is not a complete list of every affected package.
Package | Who needs it | i586 SlackBuild fix | T-Bird tuning | Notes |
---|---|---|---|---|
Mesa | Every GL app | Add -Dsse2=false to meson setup switches |
SLKCFLAGS="-O3 -march=athlon-tbird" |
For Nvidia GL to work, you must build (or rebuild) the Nvidia driver after replacing the Mesa package with a non-SSE version. |
Qt5 | KDE | Add -no-sse2 -no-sse3 -no-ssse3 -no-sse4.1 -no-sse4.2 -no-avx -no-avx2 -no-avx512 to configure switches |
SLKCFLAGS="-O3 -march=athlon-tbird" |
|
SDL2 | Games, DOSBox-X, ffplay | Add --disable-mmx --disable-3dnow --disable-sse --disable-ssemath --disable-sse2 --disable-sse3 to configure switches |
--enable-mmx --enable-3dnow and SLKCFLAGS="-O3 -march=athlon-tbird" |
|
Rust | Emacs (indirectly) | Set ARCH=i586 and add docs = false in the [build] section of the build configuration |
Not applicable | It's a blivet. Disabling docs avoids a stupid FTB. |
Librsvg | Emacs | Rebuild after replacing Rust | Not applicable | SSE2 contamination from Rust |
OpenAL | GStreamer, ffplay | Add -DALSOFT_CPUEXT_SSE=OFF -DALSOFT_CPUEXT_SSE2=OFF -DALSOFT_CPUEXT_SSE3=OFF -DALSOFT_CPUEXT_SSE4_1=OFF -DALSOFT_REQUIRE_SSE=OFF -DALSOFT_REQUIRE_SSE2=OFF -DALSOFT_REQUIRE_SSE3=OFF -DALSOFT_REQUIRE_SSE4_1=OFF to cmake switches |
SLKCFLAGS="-O3 -march=athlon-tbird" |
|
OpenCV | GStreamer | Add -DCPU_BASELINE= -DCPU_DISPATCH= to cmake switches |
SLKCFLAGS="-O3 -march=athlon-tbird" |
After replacing OpenAL and OpenCV, wipe out ~/.cache/gstreamer-1.0 and run gst-inpect-1.0 to rescan plugins |
Here are corresponding patches for the SlackBuilds (i586 not T-Bird).
When rustc crashed, the kernel logged this at level info:
Jan 22 20:37:29 abit kernel: traps: rustc[1642] trap invalid opcode ip:b36e4400 sp:bfe36230 error:0 in libstd-992072e65e1dcf67.so[b3639000+1bf000]
Retrieve the offending instruction:
bash-5.1$ printf "%x\n" $((0xb36e4400 - 0xb3639000)) ab400 bash-5.1$ objdump -d /usr/lib/libstd-992072e65e1dcf67.so | fgrep ab400 ab400: 0f 57 c0 xorps %xmm0,%xmm0
XORPS is an SSE instruction and XMM0 is an SSE register.