Weekend Read: The First 10 Questions to Answer While Running on Ampere Altra-Based Instances

For this Weekend Read, we thought that we would start from the beginning with The First 10 Questions to Answer while Running on Ampere Altra-Based Instances

You are running your application on a new cloud instance or a server (or SUT, a system under test) and you notice there is a performance issue. Or you would like to ensure you are getting the best performance, given the system resources at your disposal. This document discusses some basic questions you should ask and ways to answer those questions.

Check it out and let us know what you think and what topics you would like to see in the future. Questions/Comments? Put them below.

1 Like

To top and htop I would add glances which has a very nice display of lots of runtime information.

If you are running into NUMA issues consider CPU pinning, which forces a process to be in a specific NUMA domain. If you are fortunate your runtime has some built-in support for that.

A whole lot of job systems that you find in the real world (compilers, build systems, CI/CD operations) have single-threaded operations that will unreasonably slow down your very fast machine by stubbornly only using 1 core and leaving all the rest idle. The fundamental issue is Amdahl’s Law, which says that the speedup of a parallel system is limited by the slowest serial process. If your makefile, for example, has a hardcoded MAXCORES that’s low you have a fast system sitting idle.

I did a writeup a while back with some of these, based in part on my efforts to get acceptable performance out of an earlier generation of systems (before Ampere) where there were tons and tons of very slow cores. Good news for the Ampere side is that single-core perf on these systems is very respectable.

2 Likes

Another reason to consider pinning, aside from NUMA, is keeping use of the L2 cache contents that has taken cycles to populate. It avoids the latency of the caches having to refill as the scheduler moves a thread round the cores.

Tool #11 would be LinuxKI from HPE/Mark Ray. I just heard about it last week but will be putting it to use.

It is designed to identify performance issues beyond the typical performance metrics and results in faster root cause for many performance issues. LinuxKI is a kernel tracing toolkit designed to answer two primary questions about the system:
If it’s running, what’s it doing?
If it’s waiting, what’s it waiting on?

image