Dogfooding Ampere (or Arm)

I got myself one of the adlinktech 64 core Ampere-based boxes. I am dogfooding it as my primary desktop despite some niggling issues. Some of it is probably self-induced, as ADLINK only recommends using NVidia cards. I have been a Radeon user for years as for my needs, things just work. I have installed XFX Speedster QICK 210 AMD Radeon™ RX 6500 XT Core Gaming Graphics Card with 4GB GDDR6, AMD RDNA™ 2 in mine,it is great in that it ships with an AArch64 optionrom so everything works in the firmware, and all necessary support in the amdgpu driver has landed in the 6.2 merge window. on module loading, I get the following traceback 32 times.

[   25.945356] WARNING: CPU: 0 PID: 18 at arch/arm64/mm/mmu.c:1156 vmemmap_populate+0x20/0x34
[   25.953609] Modules linked in: amdgpu(+) bridge stp llc raid1 drm_ttm_helper ttm video gpu_sched crct10dif_ce drm_buddy polyval_ce nvme polyval_generic drm_display_helper ghash_ce ixgbe sbsa_gwdt nvme_core cec igb nvme_common onboard_usb_hub ast mdio xgene_hwmon gpio_dwapb scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables dm_multipath i2c_dev fuse
[   25.985151] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G        W         -------  ---  6.2.0-0.rc2.20230103git69b41ac87e4a.19.fc37.aarch64 #1
[   25.997829] Hardware name: ADLINK AVA Developer Platform/AVA Developer Platform, BIOS TianoCore 2.04.100.07 (SYS: 2.06.20220308) 09/08/2022
[   26.010333] Workqueue: events work_for_cpu_fn
[   26.014679] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   26.021628] pc : vmemmap_populate+0x20/0x34
[   26.025800] lr : __populate_section_memmap+0x1a4/0x1d8
[   26.030927] sp : ffff800008423900
[   26.034229] x29: ffff800008423900 x28: ffffc14d44231000 x27: ffff07ffaf9aefe0
[   26.041353] x26: ffff080f315fc000 x25: fffffc0000000000 x24: fffffffffe000000
[   26.048477] x23: ffffc14d60fd7000 x22: 0000000fffff8000 x21: 0000000000000000
[   26.055601] x20: 0000000fffff8000 x19: fffffffffde00000 x18: 0000000000000014
[   26.062725] x17: 00000000f974f0fe x16: ffffc14d605216e4 x15: ffffc14d43ab1b00
[   26.069849] x14: ffffc14d43ab9fe8 x13: ffffc14d5f7d87cc x12: ffffc14d5f7d4784
[   26.076973] x11: ffffc14d5f7272d4 x10: ffffc14d5f7e20fc x9 : ffffc14d607a89dc
[   26.084097] x8 : ffff080f315fc028 x7 : 0000000000000000 x6 : 0000000000000000
[   26.091221] x5 : 0000000000000001 x4 : fffffe0000000000 x3 : 0000000000000000
[   26.098344] x2 : 0000000000000000 x1 : fffffffffe000000 x0 : fffffffffde00000
[   26.105468] Call trace:
[   26.107902]  vmemmap_populate+0x20/0x34
[   26.111726]  __populate_section_memmap+0x1a4/0x1d8
[   26.116506]  sparse_add_section+0x138/0x1f4
[   26.120678]  __add_pages+0xd8/0x180
[   26.124155]  pagemap_range+0x324/0x41c
[   26.127893]  memremap_pages+0x184/0x2b4
[   26.131717]  devm_memremap_pages+0x30/0x7c
[   26.135802]  svm_migrate_init+0xd8/0x18c [amdgpu]
[   26.140993]  kgd2kfd_device_init+0x39c/0x5e0 [amdgpu]
[   26.146525]  amdgpu_amdkfd_device_init+0x13c/0x1d4 [amdgpu]
[   26.152576]  amdgpu_device_ip_init+0x53c/0x588 [amdgpu]
[   26.158276]  amdgpu_device_init+0x828/0xc60 [amdgpu]
[   26.163714]  amdgpu_driver_load_kms+0x28/0x1a0 [amdgpu]
[   26.169412]  amdgpu_pci_probe+0x1b0/0x420 [amdgpu]
[   26.174675]  local_pci_probe+0x48/0xa0
[   26.178416]  work_for_cpu_fn+0x24/0x40
[   26.178418]  process_one_work+0x1ec/0x470
[   26.178420]  worker_thread+0x200/0x410
[   26.178422]  kthread+0xec/0x100
[   26.178424]  ret_from_fork+0x10/0x20
[   26.178427] ---[ end trace 0000000000000000 ]---

This seems to be an issue with the system firmware, and I have an issue open with ADLINK. One area that needs improvement for better growth is for developers to have Arm based machines on their desks that they can touch and feel. This seems like an okay, if a bit expensive, start, though it is a rough experience. I get more crashes in the video stack than I should.

I am curious about what kinds of dogfooding others are doing.

4 Likes

Hi Dennis, apparently we have some known issues with AMD GPUs on Altra - I’ll share more details when I get them. Sorry you’re having some issues!

Dave.

1 Like

@dgilmore I checked with our product folks - and while we validate cards from lots of vendors, since this is an ADLink platform, whether or not they support particular GPUs is really up to them. Have you asked them whether/which AMD GPUs they support on the platform?

Thanks!
Dave

@dneary ADLINK tell me no AMD GPU’s work and no drivers exist. They do claim to be working with Ampere on it.

1 Like

I have a couple of intentions along these lines. I’ll need to get hold of something quiet like the ADLINK AVA. As a daily driver at Ampere the big choice would be run Windows 11:

and have happy apps or run the few Windows apps I need on Linux via browser (like Office 365) and WINE.
I also do a bit of Folding@Home so it’ll be interesting to see how this performs with a big silent NVIDIA card and the some of the cores given over to GROMACS workloads.