I have two Mt. Collins systems that I can install GPUs into. Unfortunately, I am not able to find what GPUs are really compatible with these systems. Anyone here have recommendations? Daily driver on these servers is Ubuntu/Debian.
AMD or NVIDIA cards should work okay. Though there is a PCIe hardware bug out there that’s needs a out of tree patch to work. It’s been mentioned here before.
@dgilmore do you me this post?
If so, does anyone know the status of this? it doesn’t seem like it was fixed. Maybe @bexcran knows (she knows everything
)
I think it is unlikely that GPU vendors are going to fix the unaligned pcie write problem in their drivers. Their market is x86 GPU users, and arm64 systems with pcie subsystems that don’t behave identically are not a big enough market to justify putting in the resources requires. In fact, Nvidia drivers USED to work without kernel patches, but starting in version 575 they are now triggering the alignment bug! Nouveau driver for Nvidia triggers it as well. Xe driver for Intel triggers the bug. AMD amdgpu does too. So trapping the faults and aligning the writes via a kernel patch that is never going to be upstreamed is the really the only way forward. I guess Ampere could use its weight to try to get a workaround patch upstreamed, but I don’t think it matters to their revenue stream enough to expend the resources.
As I understand from your comment, we should use Nvidia driver lower than 575, right ?
570 was the latest version that worked with an unpatched kernel for me. Unfortunately, the modules fail to build as of linux kernel version 6.19.