Performance Variability on Ampere Altra Under Different Kernel Versions

Hello

I’ve been running workloads on an Ampere Altra system & noticed significant performance variability when switching between different Linux kernel versions. While some versions provide stable & predictable performance, others exhibit unexpected slowdowns or inconsistencies; particularly in multi-threaded workloads. :innocent: :innocent:

This is especially noticeable in CPU-bound applications where task scheduling & NUMA optimizations play a crucial role. :slightly_smiling_face:

I suspect that certain kernel optimizations or regressions might be affecting how the Ampere Altra handles scheduling, memory management / power efficiency. :upside_down_face:

I’ve tried adjusting CPU governor settings and tuning kernel parameters, but the differences persist. It would be helpful to know if there are specific kernel versions that are best optimized for Ampere Altra / if any patches have addressed similar issues. :thinking: Checked General Discussion - Ampere arm64 Developer Community MLOps guide related to this and found it quite informative.

Has anyone else observed similar performance differences? Are there recommended kernel configurations, patches, or specific distributions that provide the best experience for Ampere-based workloads? :thinking:Any insights into tuning the kernel for better performance on Ampere Altra would be greatly appreciated.

Thank you !! :slightly_smiling_face:

Have you considered sharing which kernel/distro versions you found fast/slow?

Maybe someone can then tell ‘ah, they do not have XYZ enabled’ or something?

It’s hard to be too specific without data. Generally kernel 6.2 and on with GCC 11 or later.
CCIX between two sockets is not the fastest interconnect so I always recommend that users pin their workloads/containers/VMs to cores.

Home · HewlettPackard/LinuxKI Wiki · GitHub LinuxKI is a great tool for getting root causes if you have the time to delve into its capabilities.

Performance differences between kernel versions happen with both ARM-based systems and x86 systems, such as the random number generator throughput.

There are also tons of optimizations paired with specific kinds of workloads. I suggest you go deeper into a specific workload, for example, AI model inference,…