Hi, is anyone using an NVIDIA GPU on Ampere with kernel 6.11 or 6.12, without issues?
Not me, I am having kernel oopses.ever since Fedora 40 upgraded to kernel 6.11. I recently upgraded to Fedora 41, which now comes with kernel 6.12 and NVIDIA drivers 565.77, but the problem is still there.
I am now still using the 6.10 kernel from Fedora 40, which keeps working fine.
Here’s an example backtrace from kernel 6.12:
Unable to handle kernel paging request at virtual address ffff8000a3616bcc
Mem abort info:
ESR = 0x0000000096000021
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x21: alignment fault
Data abort info:
ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000081a2edd3000
[ffff8000a3616bcc] pgd=100008000033d003, p4d=100008000033d003, pud=100008000033e003, pmd=100008004c092003, pte=00683000028baf13
Internal error: Oops: 0000000096000021 [#1] SMP
Modules linked in: nvidia_uvm(OE) uinput snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core pppoe pppox ppp_generic slhc 8021q garp mrp stp llc cfg80211 rfkill nft_chain_nat xt_MASQUERADE nf_nat xt_helper xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_REJECT ipt_REJECT nf_reject_ipv6 nf_reject_ipv4 xt_set xt_multiport nft_compat nf_tables ip_set_hash_ip ip_set_hash_net ip_set binfmt_misc hid_logitech_hidpp cdc_ether usbnet joydev mii snd_seq_midi snd_seq_midi_event snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi mc ftdi_sio usblp xfs nvidia_drm(OE) nvidia_modeset(OE) snd_hda_codec_hdmi dm_cache_smq dm_cache dm_persistent_data dm_bio_prison vfat fat snd_hda_intel raid456 snd_intel_dspcfg snd_hda_codec async_raid6_recov async_memcpy async_pq async_xor snd_hda_core nvidia(OE) async_tx snd_hwdep snd_seq snd_seq_device snd_pcm ses acpi_ipmi enclosure video snd_timer drm_ttm_helper ipmi_ssif arm_spe_pmu ttm snd ast igb soundcore ixgbe ipmi_devintf i2c_algo_bit mdio ipmi_msghandler
acpiphp_ampere_altra arm_cmn arm_dmc620_pmu arm_dsu_pmu cppc_cpufreq acpi_tad loop dm_multipath nfsd auth_rpcgss nfs_acl lockd grace nfs_localio sunrpc nfnetlink zram hid_logitech_dj onboard_usb_dev crct10dif_ce polyval_ce mpt3sas nvme polyval_generic ghash_ce sbsa_gwdt nvme_core raid_class scsi_transport_sas nvme_auth xgene_hwmon scsi_dh_rdac scsi_dh_emc scsi_dh_alua fuse
CPU: 28 UID: 1000 PID: 18520 Comm: chrome_crashpad Tainted: G OE 6.12.4-200.fc41.aarch64 #1
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: ALTRAD8UD-1L2T/ALTRAD8UD-1L2T, BIOS 2.06 04/17/2024
pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __memcpy+0x168/0x240
lr : nvidia_vma_access+0x17c/0x200 [nvidia]
sp : ffff8000ea5d38e0
x29: ffff8000ea5d38e0 x28: 0000ffff51dfa980 x27: 0000020040000000
x26: 0000000000000980 x25: 0000000000000000 x24: 0000000000000980
x23: ffff0801af61d000 x22: 0000000000000000 x21: ffff8000a3616980
x20: 000000000000028c x19: 000000000000028c x18: 0000000000000000
x17: 0000000000000000 x16: ffffc776353c52c0 x15: ffff800080000000
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
x5 : ffff0801af61d28c x4 : ffff8000a3616c0c x3 : ffff0801af61d200
x2 : fffffffffffffffc x1 : ffff8000a3616bc0 x0 : ffff0801af61d000
Call trace:
__memcpy+0x168/0x240
__access_remote_vm+0x2e0/0x420
access_remote_vm+0x18/0x30
mem_rw+0x248/0x320
mem_read+0x1c/0x30
vfs_read+0xcc/0x330
__arm64_sys_pread64+0xb8/0xf0
invoke_syscall+0x6c/0x100
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x38/0x148
el0t_64_sync_handler+0x120/0x138
el0t_64_sync+0x194/0x198
Code: a984346c a9c4342c f1010042 54fffee8 (a97c3c8e)
After such a kernel oops, the system becomes unstable. some USB devices hang, lsusb hangs, and the system will not reboot without a hard reset.
GPU is a Quadro T1000, using the open source drivers.