Arm64 performance and Arm memory model (barriers)

This is part me sharing some stuff I learned recently, part questioning my understanding.

So there’s a class of performances fix in the JVM that I’ve come across recently, which is removing unnecessary barrier instructions. I find the whole thing fascinating, but also I have questions.

What are barrier instructions?

There are three types of barrier instructions:

  1. Instruction Synchronization Barrier (ISB) ensures that all pending instructions are fetched and ready to execute based on current MMU configuration
  2. Data Memory Barrier (DMB) ensures that memory accesses are executed in order across the barrier
  3. Data Synchronization Barrier (DSB) is the same as a DMB, but it also prevents any instructions from running that might change memory that would be accessed across the DSB instruction

I might be a little off with my definition of the instructions - but basically, these create something like what is called a sequence point in C. Sometimes, multiple load or store instructions can execute in any order (say two writes to different memory locations, or multiple reads with no writes in between) - in those situations, the processor may choose to execute these instructions in any order. This comes from Arm’s “soft” memory model, whereas x86 has more strict ordering of memory reads and writes.

The problem, I think, is that sometimes compilers are a little zealous with barrier instructions, and can result in “stopping the world” to let pending instructions catch up before proceeding, and can slow down the execution of a program compared to other architectures, especially if the barriers are unnecessary.

But, if I understand correctly (and I guess this is the question): it seems like the “soft” memory model was intended to speed up execution by handing a little more flexibility to the processor - but barrier instructions by design create sync points that can slow execution down.

So - do we have a situation where a processor adopted a memory model which was intended to make it faster, but now compilers are using barriers too much, making things slower overall - is that right?

1 Like

Dave, is this the same or different from “speculative execution”, the chip features that caused so many security issues on other architectures?

I know I’ve seen issues in the Go issue tracker about improving barriers for performance reasons, with big speedups realized in some places. Not surprised that the JVM has the same characteristics.

1 Like

I think this is exemplary (but I’m not 100% sure)

I believe this is different, as decribed here: Memory ordering . This allows an instruction reading a register to do so before another instruction which is writing a different memory address, even if that instruction was second in order to be run, if the CPU evaluates that this would be more efficient.

As I understand it (which, to be fair, might not be very well), there should be inherent protections around reordering memory accesses without DMB or DSB barriers - so I’m not sure what potential harm they are preventing. But that article I linked does point to some limitations - and maybe the barrier instructions are ways for a compiler to say “I know this is unsafe if you reorder things, so don’t touch” or “we really need this last instruction to finish before you continue”. Would love to know more.

@bexcran Am I even in the right neighborhood to understand what’s going on here?

1 Like

I’ve come across a few articles that helped a little, but I’m still scratching the surface.

  1. Investigate best AArch64 instructions for memory fences (OpenJ9)
  2. Memory barriers in ARM64
  3. Difference between DSB, DMB and ISB Instructions

Edit: Ooh - this presentation (so far) is doing a good job of weak ordering, when to use barriers (they’re not inserted by the C compiler, by the way - but they are in the JVM source core), and how to reduce your use of them safely: Arm’s Weakly-Ordered Memory Model and Barrier Requirements

1 Like