I have been diving into the world of Gen AI lately, and it’s fascinating how rapidly this field is evolving. I am curious to know how well Ampere’s processors can handle Gen AI workloads, especially in terms of power efficiency and scalability.
Has anyone here used Ampere’s architecture for any AI/ML projects? Also, I have gone through these resources/articles Artificial Intelligence Inference PerformanceGen AI Roadmap, however, they are good but I want to learn more from community members. I am looking at potential use cases and would love to hear about your experiences, insights, or any best practices you’ve discovered. How does it compare with other platforms in this space?
for things like inference (i.e. the day to day use of a model) you get a little bit of a performance bump from Ampere vs. GPUs. But you will use a lot less cores, huge energy savings, and you can get access to Ampere Chips, while GPUs are better in training, but use large amounts of energy (more than x86) and are tough to get. But this is the training part, which about 15-20% of the use cases.
Also, if you are not doing huge LLMs, then the performance as a percent is the same as smaller training workloads, but in time it is close. If something takes 5 seconds instead for 4 seconds does it matter to the human running the training? but if it costs you 30% more in energy to save a second, it probably isn’t worth it.