"What Developers Should Know About Hardware Architecture"

Hi all!

In March at SCALE 23x in Pasadena, I’m giving a presentation “What Developers Should Know About Hardware Architecture” - my general goal is to give lessons about how underlying hardware realities can impact function or performance of applications, and ways in which developers in higher level languages can mess things up by not aligning with the way the hardware works.

I am grouping these “things to know” in roughly two groups - programming practices that can slow down the processor (CPU pipelining, code that prevents parallelization across multiple cores or increases things like branch misprediction or starves the frontend or backend), and programming practices that do not take advantage of how data is organized in memory and on disk (things like ensuring you choose data structures and algorithms that efficiently use cache lines and memory locality in terms of L1, L2 cache, and RAM, reducing page faults, and organizing data for cache-efficient SIMD, etc).

I’m looking for other ideas for ways that the hardware can get in the way of good code, or common ways that high level language programmers write code that is very inefficient because it’s wrowing against the current of the underlying hardware - are there any obvious examples that you can suggest, and some easy to understand code snippets that might show how this slows things down?

Thanks!

Dave.

2 Likes

I think a section on getting to know your instruction set extensions would be a help. For example NEON has this resource - Challenge Validation
What are the inbuilt intrinsics, data types, security and transcode operations?

1 Like

In my opinion, from a developer’s perspective, developers can be categorized into multiple types, each of which is influenced by different hardware aspects in their work. For example:

  • Frontend developers:
    • High clock speed matters more than many cores for UI responsiveness.
    • Laptops and mobile devices may slow down under load, affecting animation smoothness.
  • Backend developers:
    • To reduce request latency, they need a high-frequency CPU.
    • Performance = Good algorithm + High Freq CPU + High Instruction per cycle (most people miss this).
    • If many CPU cores are using AVX, they can lower frequency of other cores that making overrall system performance unpredictable.
  • DevOps engineers:
    • A network card can prioritize traffic through dedicated TX-RX queues for latency sentitive application.
  • Database developers:

A few resources I have found that are super interesting so far:

  • What every programmer should know about how CPUs work, Matt Godbolt: Excellent presentation mostly about how CPUs interact with cache, and some of the impacts of branch misprediction, pointers to top-down analysis, and more

  • Cache locality and branch predictability in C++: Short video explaining how cache access overpowers big O for smaller datasets

  • CPU Caches and why you care: Scott Meyers - an old (2011) presentation about the impact of cache misses and how you can find yourself losing performance in multicore programming if you have false sharing (writing to different parts of the same cache line in different cores). Lots on data structures and planning data layout and algorithms for cache efficiency

  • Moving Faster: Everyday efficiency in modern C++: Alan Talbot - another exposition on the importance of minimizing cache misses and designing data structures and algorithms for cache efficiency

I’m still working on it - I now have more than enough material for multiple hours of content, so I think I will be stopping at “maximizing memory locality”, “minimize branch mispredictions”, and “coding for parallelism”.

1 Like