Gaming on Ampere in Linux - Box64, Box86, FEX-Emu, Rosetta?

So… I just posted a new video where I did some basic testing of Windows (plus Steam and a few games) and an Nvidia 3080 Ti on Ubuntu (with Dhewm 3 and SuperTuxKart), just to get a feel for Windows on Arm on the Ampere, and also for game performance with a GPU on Linux.

The video: The CPU She Told You Not to Worry About (Gaming + Windows on Arm!) - YouTube

Many commenters mentioned potentially using Box86, Box64, FEX-Emu, or even Rosetta 2 (I thought this only worked on Mac/M-series chips) for emulation on Linux.

I know the Ampere chips are lacking some of the older 32-bit stuff, and my quick test of Box64 seemed to run into problems because of it.

But otherwise, would any of these solutions have a shot at working for running X86 games on Linux on Ampere systems? I am planning on trying myself at some point… just, I have to put aside the workstation for a few weeks to finish up a couple other projects I’ve had in the wings, and get a construction project started!

2 Likes

That’s the same as with Apple’s M-series chips, so we’re in good company there :wink:
They also can’t run 32bit code anymore.

I’ve personally had pretty good success with using QEMU (static), Box64 and running a modded version of Rosetta 2 (all of those on Debian, because their multilib concept works pretty well for this task).
I’ve also played around with Rosetta and qemu-user-static on ArchLinux/Manjaro, but this required me to setup a whole x86 userland with x86_64 libraries, which was tedious to say the least.
Does work, though, I’ve played a round of “Stardew Valley” on Ampere.

Asahi Lina seemed to have pretty decent success with FEX over at https://www.youtube.com/watch?v=CJSfFzsU75g , but I haven’t personally tried that yet.

32bit ARM, 32bit x86 and x86 emulation are 2 seperate issues for the most part. Because x86 is being emulated anyway, the lack of native armv7/32bit support doesn’t really matter.

I’d also point out qemu-user-static. That can let you run almost any container on x86. I’ve tried arm 32bit, arm 64-bit and RISC-V.
Running programs outside containers is a bit hard because of dynamic libraries. I think it should be possible, but it didn’t look easy.

1 Like

I can’t offer much guidance but just to say I’ll be in the same boat. There’s some antique 32bit Fortran based application I want to port onto Ampere. RASP Creation Topics

It’s used for very detailed weather forecasting and should scale well across many cores. I’ll try qemu static and maybe box64 first before resorting to recompiling from source with the Arm Fortran Compiler.

1 Like

I’m very interested in seeing the following:

  • using recent Ampere chips, ideally around 150-200 cores but even just for starts at 40+ cores of course (pun intended)

  • run some OS that will allow running a VM with its own CentOS/RHEL/ALMA/Rocky Linux. The OS running in Ampere system could be WIndows or some Linux, don’t care.

  • the OS running inside the VM will have to be x86-64 based 64-bit version of RHEL 7 or 8,or CentOS 7, or ALMA/Rocky 8. (around 7.9 or 8.6-8.7 ideally but at least 8.4 if 8.x)

  • Inside of that VM I need to run various modules of x86_64 code, such as mongo, redis, rabbit, gtk2… the application also runs in various docker containers, one per language pack, in most configs. There are some options for dockerless though if that helps.

  • Either way, it’s a somewhat complex application (neural machine translation server) that is made (at the moment) to run on x86_64 and I’m hopeful to have an easy way vy way of Rosetta or similar tools to emulate it on arm64 especially Ampere.

  • One reason for this is that we run the LP engines in docker with up to 8 cores per LP and so if you’re on a 196-core system, it is just beautifully fast. We can also run additional servers for extra backend nodes and distribute the load but the mere thought of having a single desktop or even high-end ruggedized laptop with ~200 cores and running 20 instances of a language translation that delivers 2-4k char/sec each instance, totalling 40k-80k char even before Redis caching for repeated material (8x-10x faster yet), that’s just a dream machine. On prem :wink: - I am so wanting to see this come to reality.

  • we can get similar with GPU but the bottlenecks of moving data to/from the GPU is ppreventing the real potential speed still. We may have a few thousand cores there but only see single-digit speed gains. It’s good, but could be better, much better. PCIexpress16 or whatever is the latest, it’s a challenge. Integrated graphics with private bus should alleviate this bottleneck eventually. (fingers crossed)

Of course, native arm64 implementation of our apps is the better ultimate goal but on short notice it would be good to know how well these emulators can do the job on existing code meant for x86_64.

I’ll keep watching this.

1 Like

This is a problem. Emulating a full x86 machine/VM will be dog slow. Like Pentium 2 233 MHz slow. Yes, really.

Application level emulation, with proper JIT/caching, etc. can be pretty surprisingly fast for some applications.
But there‘s no way to run a full enterprise x86 VM workload like that properly.

Neural anything also sounds like this is a GPU workload. There is basically no way to get this running anywhere close to properly (if at all).
Getting PCIe devices to work in such a setup will be … adventurous at best.

Now with native arm64 support on the other hand, with software like redis, etc. I‘d expect some pretty outstanding performance… I‘ve seen 3x real-world performance increase over comparable x86 boxes in a redis workload :slight_smile:

Just wanted to post a link to a thread over on GitHub more specifically related to Steam games (I’ve been testing a number): Games that run via Steam / Proton on Ampere using this guide? · Issue #11 · AmpereComputing/Steam-on-Ampere · GitHub

1 Like