Migrating to 64k page size kernels - detecting dependencies

Hi all,
As we’ve been working with developers to understand the performance impact/benefits from using 64k pages on Ampere (and other aarch64) platforms, we’re encountering libraries and code that have hardcoded assumptions. This is one example with JEMalloc - Make `--with-lg-page=16` the default option on ARM64 architecture · Issue #2639 · jemalloc/jemalloc · GitHub

I’m wondering if folks here have a way to scan code to look for potential issues migrating from 4k to 64k page size. I’m guessing a combination of static scans and runtime assessments are needed, but devs on this community might know of utilities or other methods to do this.

Thanks,
Naren

3 Likes

Utilizing nix might help since we can declaratively find packages and nixpkgs has about 140k packages with many packages having multiple versions and configuration variants. I’m already aware of several packages which are known to have issues and using nix, it’s possible to find what depends on jemalloc.

Sorry, I wasn’t clear with my ask - I’m looking for a way to scan existing code to determine whether it may have problems running on kernels with 64k page size, like in the jemalloc example.

Essentially, how could I have caught the jemalloc issue, preferably without running the code and having it crash?

Is scanning code preferred or just checking the dependencies? Idk any code analysis tools which could determine if the package has a page size issue.

GCC linker has two options to specify the align requirement: max-page-size and common-page-size.

And we can check the alignment of a binary or .so using readelf. For example:

# readelf -Wl /usr/bin/aria2c |grep LOAD
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x001440 0x001440 R E 0x10000
  LOAD           0x00fcd0 0x000000000001fcd0 0x000000000001fcd0 0x000350 0x000358 RW  0x10000

The last column 0x10000 indicates this binary is 64K page aligned.

If we build an application with explicit max-page-size=4096:

gcc -O3 -Wl,-z,max-page-size=4096 -fopenmp -DSTREAM_ARRAY_SIZE=41943040 -DNTIMES=100 stream.c -o stream_4k

Then check the alignment:

# readelf -Wl stream_4k |grep LOAD
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x002a28 0x002a28 R E 0x1000
  LOAD           0x002cf8 0x0000000000003cf8 0x0000000000003cf8 0x000338 0x3c000390 RW  0x1000

And this binary will crash on a system with 64k kernel.

# getconf PAGE_SIZE
65536

# ./stream_4k
Segmentation fault

Hope this can help to identy the potential issue.

5 Likes

Thanks @David.Zeng I didn’t know that. For others wanting to test, stream can be downloaded via wget https://www.cs.virginia.edu/stream/FTP/Code/stream.c

@naren For JEMalloc, @David.Zeng told me offline how to scan for problems if the JEMalloc library isn’t stripped.

$ git clone GitHub - jemalloc/jemalloc; cd jemalloc/; ./autogen.sh
$ ./configure --with-lg-page=12 # this sets system page size (Base 2 log)
$ make -j >& make.log; echo $?
$ strings lib/libjemalloc.so | grep ^LG_PAGE
LG_PAGE 12
$strings lib/libjemalloc.a | grep ^LG_PAGE | uniq
LG_PAGE 12

default configuration

$ ./configure >& configure-default.log
$ make -j >& make.log; echo $?
$ strings lib/libjemalloc.so | grep ^LG_PAGE
LG_PAGE 16
$ strings lib/libjemalloc.a | grep ^LG_PAGE | uniq
LG_PAGE 16

This only works for JEMalloc because JEMalloc #define’s LG_PAGE and that’s written to the object file and static & shared jemalloc libs:
grep LG_PAGE jemalloc/include/jemalloc/internal/jemalloc_internal_defs.h
/* One page is 2^LG_PAGE bytes. */
#define LG_PAGE 16

4 Likes

Yup - LG_PAGE is the number of bits in a JEMalloc allocation block size - 2^12 = 4K, 2^14 = 16K, and 2^16 = 64K If you set LG_PAGE=16, you are still allocating 64K chunks of memory in JEMalloc, you’re just allocating 16 or 4 pages at a time, depending on the underlying kernel page size.