I have a number of other cards I’d like to test out, but I was wondering if anyone else has tested any and found them to work (either Nvidia or AMD).
I just plugged in my RTX 8000 and tried the latest Nvidia Professional driver for aarch64, version 525.105.17, but when it tries initializing the GPU, dmesg outputs:
[ 477.995897] NVRM: GPU 000d:01:00.0: RmInitAdapter failed! (0x24:0x65:1423)
[ 477.995979] NVRM: GPU 000d:01:00.0: rm_init_adapter failed, device minor number 0
I am able to get video output over DisplayPort, and Ubuntu 22.04 sees the card just fine, but it doesn’t seem to be able to use any card features, and nvidia-smi returns No devices were found.
I know cloud providers are pairing up Altra Max with cards like the A100, so there must be known-good combinations. Do I have to drop back to Ubuntu 20.04 to make things work?
from what I can tell, Ampere has some of the same memory mapping / caching issues (related to write combining, etc.) again, which you’re already very familiar with
The good news is, there are some hacks/workarounds:
Those won’t be good for performance at all, but it could (should?) work.
What kernel version are you using? You might even want to just compile yourself a linux/master kernel.
There was some work on AMD GPU support on Arm64 in the recent 6.2 kernel release, so you might even give your AMD GPU a shot with a very recent kernel.
Unfortunately, I don’t have a workstation Ampere yet, only servers in the datacenter, so I can’t join into the fun yet
Nice! So now the ones I know of being used with Ampere are NVIDIA T600, RTX A4500, A6000, 4060, 4070, 4080, 4000 Ada SFF, 3070 Ti, 3060 GTX 1050, A100, A100X, A10, A10G, A16, A30. I think also L4, L40.
I just picked up an ASRock Rack ALTRAD8UD-1L2T and plugged an RTX 4090 into it with no luck yet. It goes into a bootloop prior to loading the OS. ASRock Rack Tech Support thinks it might be some BIOS settings for PCIe. I’ll keep trying when I get some down time.
The list of supported graphics cards are all Nvidia.
Does this imply Nvidia closed source drivers are required?
Or would the open source “nouveau” driver work?
I’m really having a hard time to get any GPU properly working on my ASRockRack board using Fedora 40.
@bexcran, @Civiloid , @shadethegrey , you have had more success. Any idea?
What really stomps me is the “no signal” part. I have used three different cables, mini-DP to mini-DP, mini-DP to DP and mini-DP to HDMI.
I was really reluctant to get an NVidia T1000, because I want to use manufacturer supported open source drivers, but even that did not work.
I am using AlmaLinux 9.4 with an RTX 3060. I started with Fedora 40 but the newer gcc and kernel 6.9 were leading to compile errors for the nvidia 550 akmod from rpmfusion. I switched to AlmaLinux and the nvidia 550 drivers compile successfully (older kernel 5.14 and gcc is version 11). I started with the nvidia driver installer from their website but I have switched, not to rpmfusion, but to the nvidia driver repos from https://negativo17.org/nvidia-driver/.
I haven’t circled back to Fedora 40 yet, but I would suggest trying the negativo17 repo for Fedora 40. They use akmod for Fedora 40 just like rpmfusion, but their Fedora 40 repo uses the 555 version of the nvidia driver, which I assume compiles successfully with the newer kernel and gcc 14 in Fedora 40. Their repo also will enable two important nvidia_drm module options (via adding a file in /etc/modprobe.d) to get you a text console on your nvidia card, namely modeset=1 and fbdev=1. Without those two options, X will use your nvidia card but you won’t have text tty console on the card, and wayland compositors likely won’t work either.
I should mention that the negativo17 nvidia repo packages do not add nvidia modules to your initramfs, so the nvidia display won’t initialize (and start giving you console messages) until after the root fs is mounted, i.e. you won’t see console output quite as early as when the modules are loaded in the initramfs. You could add the modules to the initramfs yourself if desired.
An alternative option would be to download the nvidia installer for version 555. In that case, you will need to manually enable the two nvidia_drm module options mentioned above if you want to see any text console, as the nvidia installer doesn’t enable those options (at least it didn’t when I used it to install version 550).
Then there’s the whole arm64 option ROM issue. My RX 6800 came out of box with an arm64 option ROM, but I didn’t want to maintain the PCIe erratum kernel patch to make the amdgpu driver work, so I switched to an nvidia card. It’s disappointing to hear that even with the patch you weren’t having good success with your AMD gpu’s. Of course, nvidia has no arm64 option ROM so their can never be a UEFI GOP to display boot menus, etc. Perhaps they will release one at some point…
I was on Fedora 39, but I used a mixture of the Nvidia driver directly from nvidia, and the nvidia xconfig command from the ampere script for ubuntu after I installed the nvidia driver. ( sudo nvidia-xconfig -a --cool-bits=31 --allow-empty-initial-configuration) If the the nvidia driver does not complete cleanly, it won’t work. I think I had to set something in grub when it booted or I wouldn’t get any display at all, but I cannot remember what I put, I think nomodeset, but I’m not sure. I haven’t used Fedora in a while, I switched to Ubuntu for the 32-bit libraries for gaming.
The specific problem with Fedora 40 is compile errors with nvidia driver 550, regardless how it is packaged. And because there is no efi gop framebuffer, you need nvidia_drm.modeset=1 and nvidia_drm.fbdev=1 to get a framebuffer for the linux console.
Thanks for pointing out Nvidia driver, CUDA tools and libraries – negativo17.org . Indeed it installs and builds fine. I might have achieved the same results by switching to rpmfusion rawhide. I can’t try the effect right now, I am accessing my server remotely now and there is no gpu card in it now.
The modeset instructions might help. Perhaps for nouveau too. The reason that AMD cards gave me a picture at all may be caused by the fact that there is a GOP. For Intel and NVidia there is none, which makes my monitor go to deep sleep immediately, or switch to another active port. This might cause the driver to not detect the monitor.
The initial framebuffer the linux kernel uses for the console with these cards is typically the efi gop framebuffer. The linux device driver is efifb. In the case of these cards with no arm64 option rom, there is no UEFI GOP driver to create that framebuffer, which gets passed to the kernel and used by the efifb driver. By telling the nvidia_drm module fbdev=1, it will create a framebuffer that the linux console can use. Not sure what, if any, equivalent there is for the intel gpus.
Of course, the cards will output video if the X server starts and is configured to use the card, even if there is no linux console framebuffer, as the X server creates its own framebuffer. But when the X server quits, the monitor will again go blank. So in your nice table above, I wonder if some of the “driver loads but monitor is blank” is because there is no linux console framebuffer and X server has not been started using that card.
Finally got a picture from the T1000 on my 4K monitor, using the negativo17 repo, using nvidia-open kernel modules. There’s an rpmfusion nvidia beta (555) repo in copr that I tried first, but that does not support aarch64.
But I did not get a picture immediately. I switched from the default Wayland to Xorg, restarted gdm, then switched back to Wayland, again restarted gdm, and then finally I got a login screen.
I have yet to see whether this is going to be a standard routine after each boot.
It surprises me that there’s nothing in dmesg or journalctl that seems to log the detection of displays.
The emulator driver loads on the Asrock board with latest 2.06 firmware from the uefi shell, and the device it creates shows up as expected from the documentation. However, X64 binaries (eg. X64 build of the edk2 uefi shell) fail with an “unsupported” error message when attempting to execute them. The same emulator driver and X64 binaries work as expected when tested in a qemu kvm aarch64 virtual machine. I haven’t delved into what AMI might have done to prevent this from working… I assume it is that the uefi firmware from AMI doesn’t support the EFI protocol that the emulator driver produces, EDKII_PECOFF_IMAGE_EMULATOR_PROTOCOL but I haven’t done the work to prove that yet