New issue with Asrock Rack ALTRAD8UD-1L2T with Q64-22

Has anyone else experienced the board randomly deciding it doesn’t want to turn on? Or am I the only lucky one?

BIOS = 2.06
BMC = 2.07

What I did so far…

The power button was not responsive; I did the typical stuff and pulled the power for the PSU, assuming it was locked up, but I also noticed that none of the fans were spinning. I waited a while and plugged it back in, and it still will not power up, nor will the BMC respond (it seems flakey anyway on a typical day, or when you reboot the host, it sometimes resets). So I removed it from the rack and put it on my workbench, assuming something must have failed. I removed all the PCIe cards (Two highpoint x4 NVMEs (Bottom two slots as they are 16x), a dual 25Gb SFP28 (Top slot), and two Samsung 990 Pro NVMEs installed on the motherboard. I even disconnected all the fans. Swapped PSU, thinking it could be the PSU that decided to quit. Same thing, nothing happens, but I do see the LEDs and looked at the manual PDF on Asrock’s site, and it seems that it is waiting for a power event; I then tried to connect a benchtop ATX Front Panel switch in the off chance that I grabbed the only screwdriver that couldn’t create a circuit for the ATX power switch. Still nothing. Checked and inspected all of the power cables, ensuring the pins were seated correctly. I then went a tested the PSUs and all power was well within it’s operating range, both PSUs checked out fine. I looked to ensure that the 24-pin to 4-pin ATX signal cable included with the box was seated correctly and that a pin didn’t get pushed out or was somehow damaged from sitting in the 4U case and not moving. I started removing the memory and testing the DIMMs in different configurations. I also tested them in a SuperMicro board with an Xeon; nothing was wrong with any DIMM. According to the manual, I installed just one, two, and four in the slots indicated. Nothing, just some green LEDs.

What has worked…
So, I decided to disable the BMC using the jumper pin, and as soon as I did, it fired up. So it seems that the BMC just decided to perform self-immolation; I’m very disappointed with this right now. I figured I’d reach out to Asrock Support. This will be the second time I’ve contacted them since I got this board, and they never responded to the first one. That first issue involves the OpenBMC randomly disconnecting when rebooting the host. Since support is non-existent, I wanted to see if anyone here has experienced anything similar. Given that I run it headless, having out-of-band management is essential for me, and with that possibly not working anymore and no GPUs (A380, A750, T400, 3090 non-Ti, 6950XT) that I’ve tried will display POST, I don’t even want to use it as a workstation.

Does anyone know how to flash the BMC without the BMC being enabled? I never thought I’d have to ask something like this, but I can’t find any information, and ASRock Rack doesn’t reply to support requests. I did look at Newegg to see if I can return It, but since I’m past that window, I’m stuck trying to get this working properly again.

I tried flashing the bios with the same code, assuming some corruption, but that didn’t work. I also tried going to the beta 3 code, which still didn’t work, but it introduced a new issue: it won’t boot an OS. It won’t boot RHEL 9, installed on the local NVME, and won’t boot a USB key with Debian or Ubuntu; it just sits on a black screen after POST. I then flashed it back to 2.07, and it’s currently just sitting at the BIOS screen frozen. I will say that it’s got to be a faulty board at this point.

Well after lots of flashing, resetting, and pulling the battery letting it sit overnight it seems to have come back to life. I flashed it back to 2.07 since I’d rather not run a beta firmware.

Back to the normal reboot… nothing special, just a good old fashioned /sbin/init 6, and IPMI decides to stop functioning until power is removed.

Does anyone have this issue? That was why I reached out to Asrock a month ago and never got a response.

FWIW, I had a lot of trouble getting these boards to consistently boot, I discovered that the IO shield on the motherboard was shorting against (both) of my cases. I decided to remove the IO shield and have consistent boots ever since. It’s not the most sophisticated solution in the world but it fixed my issues.

1 Like

Have you checked the board status light, in the board manual

This will indicate the board current status, there is another LED, “BMC_LED1 Green BMC heartbeat LED”, it can know the current BMC live status.

When you enable but the host cannot boot issue, have you check the BMC power policy?

Run following command in the BMC

$ ipmitool chassis policy list
Supported chassis power policy: always-off always-on previous
ipmitool chassis policy always-on

alway-on can make host power on when bmc ready, it can also choice other options.

The jumper named disable BMC, the real function is bypass BMC ready.
It can power on host without BMC ready.

So, suppose change the power policy can solve your problem.

2 Likes

I have similar issue, but in my case it is only when I remove the power plug. It takes 5-6 minutes until Power Button reacts to turning it on.
That is the case with BIOS 2.x and with beta 3.x.
I think motherboard needs some internal check up.

Tried to start from BMC and was not working. After 5-6 minutes allowed me to boot without any issues.

1 Like

By default the BMC_DIS jumper is set to wait until the BMC has booted until it allows the host to power on.

Move the jumper over to the “BMC Disabled” position and it’ll allow the power button to work while the BMC is still booting.

By the way, I ran into a problem when I was installing the board because the I/O shield was pressing against the ID Button. It turns out that if you hold the ID button it’ll prevent the BMC from booting (it doubles as a ‘BMC reset’ button).


2 Likes

Does anyone know how to flash the BMC without the BMC being enabled? I never thought I’d have to ask something like this, but I can’t find any information, and ASRock Rack doesn’t reply to support requests.

Unfortunately I believe the only way to flash the BMC without it already running is by removing the SPI-NOR chip from the EEPROM socket and putting it into an external programmer. The good thing about these boards is that both the BIOS and BMC chips are socketed so at least it’s an option.

From a different board I have, showing what the EEPROM socket looks like:


1 Like

By the way I’m currently working on an improved OpenBMC firmware for the board, and an EDK2 firmware which supports video output via a NVIDIA card during boot.

5 Likes

@dohertyctl @bexcran @governetes Per our friend William Lee of ASRock Rack Technical Support: If you found the I/O shield getting too close to the rear UID LED Button, is there any lack in spacing at the motherboard mountings inside the chassis where it allows you to slightly adjust the motherboard’s positioning? The scew holes on the motherboard are designed to be a little wider than the mounting standoff to provide about 2mm-3mm marginal space to position the MB against the I/O shield. But yes, if the UID button is somehow in depressed position during the initial AC power on, it could put the BMC controller is “reset” condition or even corrupt the BMC firmware.

Since now that you have temporarily remove the I/O shield to free the UID button, did the BMC boot up or can you access the IPMI webUI? Also, since last night, we have just released a production BMC fimware 3.06.00 on the product page https://www.asrockrack.com/general/productdetail.asp?Model=ALTRAD8UD-1L2T#Download

Is this guide I wrote helpful? Rebecca flagged things for me that you should do to make it boot faster: Arm developer build guide for Ampere Altra

1 Like

I want to thank everyone for jumping in to assist, I was getting very frustrated but all seems good now.

The response has been phenomenal; I was working in an air-gapped environment, so I couldn’t get much time to look at this until last night. Big thanks to @JoeSpeed @bexcran @governetes.

I installed the new 3.06 OpenBMC firmware and 3.06 BIOS updates this morning. I will let it run for a bit to see how it shakes out. Currently, I have a few VMs for Gitlab and the associated shell and container runners and going to stand up a couple OpenStack instances in virtd

This thing is great; I’ve shut down my old HP DL 380 that I used before, and it consumes nearly one-fifth of the power.

I like the idea of removing the I/O shield, but I went with a more hillbilly approach; I layered electrical tape on the back of it and then put it back in the case to provide some insulation.

No I/O shield brings me back to the Baby AT days when the case would come with the single DIN port cut out, and many times it wouldn’t line up, so I’d toss it. I guess I like the way it looks in the case, which is in a 4U rackmount case, in a rack that I’ll never see the back of… maybe tossing it would have been the better option.

2 Likes

Wow, those are brand new. Can anyone tell what improvements there are other than “Improve system compatibility”?

1 Like

I haven’t been able to find any release notes unfortunately, so I don’t know what’s changed.

Quick update, it’s been rock solid stable with the 3.06. There are no noticeable issues, and the logs are clean. It’s just humming along quietly and efficiently. Running mostly a mix of rpm-based distro VMs and a few containers.

2 Likes

Brilliant! Thank you, it’s always good to get news of a positive resolution after someone has an issue.