How can I maximize performance on Ampere processors for high-throughput workloads?

Suheb · June 13, 2025, 1:17pm

What can I do to make sure I’m getting the best possible speed and performance when I’m using these processors for heavy workloads?

GregVRose · June 13, 2025, 2:24pm

When you say throughput are you referring to disk read/write IO or network IO - or some combination of both?

Do you have an example of what you mean by “heavy workloads”?

dneary · June 16, 2025, 3:36am

Generally speaking, throughput and latency are in conflict. To improve throughput, you can increase various network and I/O buffers until RAM is your primary constraint, pinning cores for network I/O to avoid context switches impacting bandwidth, and turn off all power saving features to ensure the CPU is always available by putting the CPU governor in performance mode (basically turning off various CPU power saving features). This will impact tail latencies in response time for requests by increasing queue length for processing.

One way to configure a lot of these things in the OS is to use tuned to choose “network-throughput” as a tuned profile, and make sure you have enough RAM per core in cloud instances. For throughput-intensive workloads, making sure that you have enough CPUs to handle the compute requirements of the application is important too, obviously, or you won’t get to that situation where network I/O and RAM are the constraints.

dneary · June 16, 2025, 11:17pm

Looking at the network-throughput tuned profile, which includes the throughput-performance profile, the highlights are:

[cpu]
governor=performance
energy_perf_bias=performance
min_perf_pct=100
energy_performance_preference=performance

this puts the CPU in performance mode, and turns off power management features.

[disk]
# Set disk readahead to 4M (large disk I/O buffer).
readahead=>4096

[sysctl]
# Avoid swapping processes out of physical memory aggressively (value is 0-100)
vm.swappiness=10

# Increase the max socket queue length fron the default (128) to a larger value (max in
# Linux is 64K, and 4096 on older kernels)
net.core.somaxconn=>2048

# Increase kernel network buffer size maximums.
#
# The buffer tuning values below do not account for any potential hugepage allocation.
# Ensure that you do not oversubscribe system memory.
net.ipv4.tcp_rmem="4096 131072 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"

The comments are pretty explanatory here - and you can use these profiles as a basis for your own, with extra configurations or options overridden to your preferred values.

binh · June 23, 2025, 8:11am

Due to the high core density of Ampere processors, workloads should be well parallelized to take full advantage of the many cores—for example, by adding more worker threads or running multiple workload instances simultaneously. Tools like htop provide an overview of how workload threads are distributed across the cores.

quocbao · July 19, 2025, 2:54pm

You should describe your workload first.

Topic		Replies	Views
Could Someone Give me Advice for Optimizing Performance on Ampere Altra Systems? General Discussion oci , ampere	6	349	August 19, 2024
Weekend Read: The First 10 Questions to Answer While Running on Ampere Altra-Based Instances Content and Articles ampere , devto	2	468	May 30, 2023
Performance analysis, tools and methods from HPC Asia General Discussion ampere	1	83	May 28, 2025
Running Cassandra with higher throughput, using less energy Content and Articles ampere , cassandra , open-source	0	796	March 8, 2023
Looking for additional ideas: Optimizing Java applications for the cloud General Discussion cloud , ampere , java	14	405	May 15, 2025

How can I maximize performance on Ampere processors for high-throughput workloads?

Related topics