Performance analysis, tools and methods from HPC Asia

I came across an interesting writeup from the 2025 HPC Asia Workshops from Stonybrook University https://dl.acm.org/doi/10.1145/3703001.3724384

The paper describes benchmarking tools and methods used using AmpereOne A192-32X vs. a number of other machines including Sapphire Rapids, Milan, and Grace. They tested a variety of primarily HPC workloads (not exactly the design point for AmpereOne, but an interesting comparison anyway) including: Benchmarks included genomics, AI/ML, computational fluid dynamics, molecular dynamics, linguistics, and statistical analysis.

A couple interesting ideas I noticed:

  • Strong performance on multithreaded applications looking for high throughput (e.g. genomics with BWA, AI inference with PyTorch, and Linguistics (Bufia), but weaker on some HPC tasks requiring wide SIMD floating-point registers or high memory bandwidth as HBM can offer dramatic improvements like OpenFOAM. ​

  • Measuring power and energy is an important metric when considering total response - things like Turbo or HT may prove to be problematic for deterministic results for high node occupancy systems

4 Likes

I found that an interesting study. Looking just at the genomics applications (Fig 4), there’s a lot of variability in relative performance even for apps that are doing similar things. Also lots of variation in power used within the genomics apps - with Ampere using the least amount of power.
john

3 Likes