Fitting devstral-2, olmo-3 on AmpereOne/AltraMax

lu_zero · December 21, 2025, 10:11am

There are a bunch of completely/near-completely open models that are very interesting, but documented to work well on gpu.

Can somebody with hardware access try them on the current ampere hardware so there is an additional point of comparison?

quocbao · December 24, 2025, 12:35pm

Can you clarify which specific models and performance metrics (latency, throughput, correctness,…) you are interested in ?

lu_zero · December 24, 2025, 1:01pm

I guess we can start with:

which system can let them run, e.g.: llama.cpp, vLLM, candle-vllm, mistral-rs
- would they fit 2GB per core or need 4GB per-core
are they fast enough compared to the GPU suggest layout

lu_zero · February 25, 2026, 8:39am

Now I have access, currently even devstral-2 mini is painfully slow and possibly still not supported on the proprietary fork…

Topic		Replies	Views
Qwen3.5-35B-A3B benchmarks on AmpereOne AI/ML	25	244	April 24, 2026
Optimized TensorFlow for Ampere Content and Articles tensorflow , ampere , ai	5	679	January 3, 2023
Ampere AI + Matoha Case Study: AI Training on CPU Instances Alone AI/ML	2	668	February 21, 2023
Jeff G's video on the Ampere Dev Platform w/ 128 Cores General Discussion ampere , dev-platform , jeff-geerling	0	659	October 27, 2023
GPU support for Ampere Altra? General Discussion	27	4168	April 29, 2026