In the latest Ampere Developer Impact, host Dave Neary (@dneary) and Shiva Kintali, Director of Engineering - AI/ML, discuss the computational requirements of modern AI workloads. The talk breaks down floating-point operations (FLOPs) across different model types, including why generating a single token in Llama 3 8B takes roughly 16 billion operations. The conversation also covers how defining SLAs for latency and accuracy helps determine when multi-core CPUs are a viable alternative to specialized accelerators for inference tasks.
The next video coming up (should be released by the middle of March) is from @lawik (sorry Lars it got stuck in editing.) Underjord | Booting 5000 Erlangs on Ampere One 192-core
Then RJ Nowling from MSoE talking about his Mosquito Genomic Research with
Ampere workstation.