Extreme I/O slowdown with PAE kernel

Last modified: Mon Feb 19 11:25:19 EST 2018

The problem

In December 2014 I migrated from a 32-bit PC with 4 GiB of RAM to a 64-bit PC with 32 GiB of RAM.  I ran 64-bit Slackware for routine use but installed 32-bit Slackware alongside for the convenience of quickly compiling software to run on slower 32-bit PCs.  It is possible to use up to 64 GiB of RAM in 32-bit mode thanks to the Physical Address Extension (PAE) CPU feature, which is optionally enabled by Linux kernels.

Eventually I noticed an extreme I/O slowdown happening only in 32-bit mode, and only on the new PC.  It would run normally for a while, but if I ran a big compile or tried to unpack a Slackware installation, the write bandwidth at some point would slow to a trickle, with no obvious indicators of where the bottleneck was.

Bug reports

As of 2016-08-25 none of these bug reports had received a proper fix.  Workarounds included:

Blame

Most likely, the bad behavior is being triggered by a shortage of lowmem that results primarily from having 32 GiB of RAM.  By default, 32-bit kernels allocate only 1 GiB of address space for the kernel and all of the data that it must keep available in lowmem, leaving 3 GiB of address space for user processes.  Larger amounts of RAM cause more of the 1 GiB to be used up by overhead.

It has been recognized since 2003 if not earlier that the 1 GiB lowmem allocation is insufficient to manage 32 GiB of RAM.  Ingo Molnar wrote:

But as the amount of RAM increases, the 3/1 split becomes a real bottleneck.  Despite highmem being utilized by a number of large-size caches, one of the most crucial data structures, the mem_map[], is allocated out of the 1 GB kernel VM.  With 32 GB of RAM the remaining 0.5 GB lowmem area is quite limited and only represents 1.5% of all RAM.  Various common workloads exhaust the lowmem area and create artificial bottlenecks.  With 64 GB RAM, the mem_map[] alone takes up nearly 1 GB of RAM, making the kernel unable to boot.

In practice, the limit before problems set in has proven to be only 8 GiB.  This limit, and kernel devs' disinclination to remedy it, is documented obscurely by the final sentence of the undated Documentation/vm/highmem.txt file in the kernel source tree (4.6.7):

The general recommendation is that you don't use more than 8GiB on a 32-bit machine—although more might work for you and your workload, you're pretty much on your own—don't expect kernel developers to really care much if things come apart.

The benevolent dictator himself acknowledged the problem in a rant in 2007, in which he said (among other things):

PAE was a total and utter disaster.  ...  Directory caches, inodes, etc. couldn't use it, and in general it meant that under Linux, if you had more than 4GB of physical memory, you generally ran into problems (since only 25% of memory was available for normal kernel stuff—the rest had to be addressed through small holes in the tiny virtual address space).

It is not clear whether the I/O slowdown is actually a regression or merely a new symptom of the same old problem.  It is possible that the slowdown was always there, but some version of the kernel started using more lowmem, causing the slowdown to be triggered more often.  Or, maybe just the mode of failure changed.

Solution

Ingo's 4G/4G patch provided a full 4 GiB address space to both kernel and user at the cost of marginal performance overhead.  It solved real problems and got deployed by Red Hat Enterprise Linux, but it was never applied upstream.  Most people then migrated to 64-bit, leaving just a few of us "weirdos" still suffering from the unsolved PAE problem.

Although there is no 4G/4G patch for a modern kernel, it is possible to give the kernel more lowmem address space by taking some away from user space.  The following description is accurate for kernel version 4.6.7.

Original configuration:

Changes to adjust the memory split:

The help text for "Memory split" reads:

Select the desired split between kernel and user memory.

If the address range available to the kernel is less than the physical memory installed, the remaining memory will be available as "high memory".  Accessing high memory is a little more costly than low memory, as it needs to be mapped into the kernel first.  Note that increasing the kernel address space limits the range available to user programs, making the address space there tighter.  Selecting anything other than the default 3G/1G split will also likely make your kernel incompatible with binary-only kernel modules.

If you are not absolutely sure what you are doing, leave this option alone!

Changing the memory split to 2G/2G fixed the problem for me, and the Nvidia driver still works.

See also:  Running out of LowMem with Ubuntu PAE Kernel and 32GB of RAM.

Good luck.

P.S.:  a PAE-killing kernel fault was fixed in 4.15.2

commit 62c00e6122a6b5aa7b1350023967a2d7a12b54c9
Author: William Grant <william.grant@canonical.com>
Date:   Tue Jan 30 22:22:55 2018 +1100

    x86/mm: Fix overlap of i386 CPU_ENTRY_AREA with FIX_BTMAP
    
    commit 55f49fcb879fbeebf2a8c1ac7c9e6d90df55f798
    
    Since commit 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the
    fixmap"), i386's CPU_ENTRY_AREA has been mapped to the memory area just
    below FIXADDR_START. But already immediately before FIXADDR_START is the
    FIX_BTMAP area, which means that early_ioremap can collide with the entry
    area.
    
    It's especially bad on PAE where FIX_BTMAP_BEGIN gets aligned to exactly
    match CPU_ENTRY_AREA_BASE, so the first early_ioremap slot clobbers the
    IDT and causes interrupts during early boot to reset the system.
    
    The overlap wasn't a problem before the CPU entry area was introduced,
    as the fixmap has classically been preceded by the pkmap or vmalloc
    areas, neither of which is used until early_ioremap is out of the
    picture.
    
    Relocate CPU_ENTRY_AREA to below FIX_BTMAP, not just below the permanent
    fixmap area.
    
    Fixes: commit 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
    Signed-off-by: William Grant <william.grant@canonical.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    Link: https://lkml.kernel.org/r/7041d181-a019-e8b9-4e4e-48215f841e2c@canonical.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KB
Home