It began with Doom 2016 – a Switch port so ambitious, it simply didn’t seem possible. However, since then, a procession of technologically ambitious current-gen console titles have migrated onto the Nintendo console hybrid, culminating in the arrival of the wonderful Metro Redux from 4A Games – highly impressive conversions and perhaps the closest, most authentic first-person shooter ports we’ve seen. So what’s the secret? How do developers manage to achieve such impressive results from five-year-old Nvidia mobile hardware?
“At first, I did have really big concerns performance-wise,” admits 4A’s chief technical officer, Oles Shishkovstov. “You know, going from base PS4/Xbox One with approximately six and a half or seven CPU cores running at 1.6 GHz to 1.75GHz down to only three cores at 1.0GHz sounds scary. The GPU was fine, as graphics can be scaled up and down much easier than, for example, game simulation code.”
The results of the conversion work are certainly impressive bearing in mind the yawning gap in CPU specs. 4A started out by translating over the existing Metro Redux games from PS4 and Xbox One (and to stress the point, Switch get last-gen ports here), a process the 4A team carried out very quickly, but this early version of the game could only manage frame-rates of around seven to 15 frames per second. The games were entirely CPU-bound.
Halving the target frame-rate from the PS4 and Xbox One’s 60fps down to 30fps was required before the task of optimising systems began. “First, we backported some optimisations from Exodus to the Redux codebase,” Shishkovstov explains. “Then we focused on animation processing on the high level and on extracting ILP (instruction-level parallelism) out of the A57 on the low level – down to assembly. The low level optimizations alone got us to an unstable 30Hz when we were not GPU bound. Then the bone LODding arrived – the CPU [issue] was ‘solved’ even with some headroom necessary for stable framerate.”
Explained like that, 4A’s solution to the Switch’s CPU limitation seems fairly straightforward but the process of coding at the assembly level – literally the native language of the Switch ARM Cortex-A57 CPU cluster – can’t have been a walk in the park. Animation sucks up a lot of processor cycles, so the idea of adding level of detail (LOD) transitions to the system makes a lot of sense.