There are a bunch of completely/near-completely open models that are very interesting, but documented to work well on gpu.
Can somebody with hardware access try them on the current ampere hardware so there is an additional point of comparison?
There are a bunch of completely/near-completely open models that are very interesting, but documented to work well on gpu.
Can somebody with hardware access try them on the current ampere hardware so there is an additional point of comparison?
Can you clarify which specific models and performance metrics (latency, throughput, correctness,…) you are interested in ?
I guess we can start with: