Weekend Read: The First 10 Questions to Answer While Running on Ampere Altra-Based Instances

Aaron · May 26, 2023, 11:16am

For this Weekend Read, we thought that we would start from the beginning with The First 10 Questions to Answer while Running on Ampere Altra-Based Instances

You are running your application on a new cloud instance or a server (or SUT, a system under test) and you notice there is a performance issue. Or you would like to ensure you are getting the best performance, given the system resources at your disposal. This document discusses some basic questions you should ask and ways to answer those questions.

Check it out and let us know what you think and what topics you would like to see in the future. Questions/Comments? Put them below.

vielmetti · May 30, 2023, 4:27pm

To top and htop I would add glances which has a very nice display of lots of runtime information.

If you are running into NUMA issues consider CPU pinning, which forces a process to be in a specific NUMA domain. If you are fortunate your runtime has some built-in support for that.

A whole lot of job systems that you find in the real world (compilers, build systems, CI/CD operations) have single-threaded operations that will unreasonably slow down your very fast machine by stubbornly only using 1 core and leaving all the rest idle. The fundamental issue is Amdahl’s Law, which says that the speedup of a parallel system is limited by the slowest serial process. If your makefile, for example, has a hardcoded MAXCORES that’s low you have a fast system sitting idle.

I did a writeup a while back with some of these, based in part on my efforts to get acceptable performance out of an earlier generation of systems (before Ampere) where there were tons and tons of very slow cores. Good news for the Ampere side is that single-core perf on these systems is very respectable.

vikingforties · May 30, 2023, 10:09pm

Another reason to consider pinning, aside from NUMA, is keeping use of the L2 cache contents that has taken cycles to populate. It avoids the latency of the caches having to refill as the scheduler moves a thread round the cores.

Tool #11 would be LinuxKI from HPE/Mark Ray. I just heard about it last week but will be putting it to use.

It is designed to identify performance issues beyond the typical performance metrics and results in faster root cause for many performance issues. LinuxKI is a kernel tracing toolkit designed to answer two primary questions about the system:
If it’s running, what’s it doing?
If it’s waiting, what’s it waiting on?

Topic		Replies	Views
Could Someone Give me Advice for Optimizing Performance on Ampere Altra Systems? General Discussion oci , ampere	6	211	August 19, 2024
Performance Variability on Ampere Altra Under Different Kernel Versions General Discussion	4	115	May 6, 2025
How can I maximize performance on Ampere processors for high-throughput workloads? General Discussion ampere	5	79	July 19, 2025
Ampere Altra delidded Content and Articles altra , video	1	513	March 1, 2023
Help needed: Ampere 128 core vs Ampere 96 core; Jeff Geerling Performance Testing General Discussion	1	321	September 20, 2023

Weekend Read: The First 10 Questions to Answer While Running on Ampere Altra-Based Instances

Related topics