Unreal Engine has defined the technological standards of high-definition console shooters, but Gears of War apart it seems as if it is down to proprietary engines to exceed them: Infinity Ward, Bungie and Guerrilla Games have produced the most critically well-received FPS titles on console, and all of them are using their own in-house technology.
Coming from the technological architects of GSC's S.T.A.L.K.E.R., the new 4A engine powering Metro 2033 from THQ is another proprietary codebase that looks capable of producing pretty astonishing visuals. Thus far, most of THQ's marketing efforts have concentrated on the visually superb PC build, though Eurogamer got hands on with both versions last month. Digital Foundry has had extensive access to a preview Xbox 360 build, and what we've seen has been impressive.
To give you some idea of what has caught our eye, here's a video of the game running on the Microsoft console, captured and edited by us with a view to showcasing the unique visual look of this new technology, and how it translates into the gameplay.
We wanted to know more, so arranged an interview with 4A Games' chief technical officer Oles Shishkovtsov. Having previously worked with GSC as an instrumental guiding force behind the technologically impressive S.T.A.L.K.E.R., there has been controversy that the 4A engine is an offshoot of proprietary GSC IP, but Shishkovtsov disagrees, saying that the new tech was started as a pet project borne out of the frustrations in dealing with his older engine.
"The major obstacles to the future of S.T.A.L.K.E.R. engine was its inherent inability to be multi-threaded, the weak and error-prone networking model, and simply awful resource and memory management which prohibited any kind of streaming or simply keeping the working set small enough for 'next-gen' consoles," explains Shishkovtsov.
"Another thing which really worried me was the text-based scripting. S.T.A.L.K.E.R. was purely LUA-scripted," he continues. "Working on S.T.A.L.K.E.R. it became clear that designers/scriptwriters want more and more control and when they got it, they were lost and needed to think like programmers, but they weren't programmers! That contributed a lot to the original delays with S.T.A.L.K.E.R."
It was these problems and issues that left Shishkovtsov looking for an entirely new direction for the next engine.
"I started a personal project to establish the future architecture and to explore the possibilities of the design," he says. "The project evolved quite well and although it wasn't functional as a game (not even as a demo: for example it didn't have any rendering engine back then) it provided me with clear vision on what to do next."
Shishkovstov and his colleague Aleksandr Maksimchuk left GSC a full year before S.T.A.L.K.E.R. eventually shipped, and the 4A engine, with its emphasis on a hugely efficient implementation of multi-threading good for both PC and console, took shape. Shishkovtsov claims that the 4A engine has no relationship with the S.T.A.L.K.E.R. X-Ray tech because a port would be "extremely difficult".
"A straight port will not fit into memory even without all the textures, all the sounds and all the geometry," he reckons. "And then it will work at around 1-3 frames per second. But that doesn't matter because without textures and geometry, you cannot see those frames! That's my personal opinion, but it would probably be wise for GSC to wait for another generation of consoles."
According to Shishkovtsov, the philosophy of parallelising the code is different to many games, but akin to the techniques employed by Criterion Games for Burnout Paradise: processing tasks are allocated to whatever processors are available at the time.
"We don't have dedicated threads for processing specific tasks in-game with the exception of a PhysX thread," explains Shishkovtsov. "All our threads are basic workers. We use task-model but without any pre-conditioning or pre/post-synchronising. Basically all tasks can execute in parallel without any locks from the point when they are spawned. There are no inter-dependencies for tasks.
It looks like a tree of tasks, which start from more heavyweight ones at the beginning of the frame (to make the system self-balanced). The last time I measured the statistics, we were running approximately 3,000 tasks per 30ms frame on Xbox 360 on CPU-intensive scenes with all hardware threads at 100 per cent load."
And again, similar to Criterion's multi-threading work, 4A Games has found that a similar implementation works on the Sony console too.
"The PS3 is not that different... We use 'fibres' to 'emulate' a six-thread CPU, and then each task can spawn a SPURS (SPU) job and switch to another fibre. This is a kind of PPU off-loading, which is transparent to the system. The end result of this beautiful (apart from somewhat restricting) model is that we have perfectly linear scaling up to the hardware deficiency limits."
While the engine is described as a complete cross-platform development environment, there is to be no PlayStation 3 SKU of Metro 2033. The game will launch on PC and Xbox 360 only. However, the Sony console played a big part in the development work for the core tech.
"From the start we selected the most 'difficult' platform to run on. A lot of decisions were made explicitly knowing the limits and quirks we'll face in the future," explains Shishkovtsov.
"For me personally, the PS3 GPU (they like to call it RSX for some reason) was the safe choice because I was involved in the early design stages of NV40 and it's like a homeland: RSX is a direct derivative of that architecture. Reading Sony's docs it was like, 'Ha! They don't understand where those cycles are lost! They coded sub-optimal code-path in GCM for that thing!' All of that kind of stuff..."
The decision not to bring the title to PS3 came from THQ and the developer reckons it has made a positive difference to the game-making process, as limited resources are deployed on two platforms rather than three.
"THQ was reluctant to take a risk with a new engine from a new studio on what was still perceived to be a very difficult platform to program for - especially when there was no business need to do it," says Shishkovtsov.
"I think it was a wise decision to develop a PC and console version. It has allowed us to really focus on quality across the two platforms. One thing to note is that we never ran Metro 2033 on PS3, we only architected for it. The studio has a lot of console gamers but not that many console developers and Microsoft has put in a great effort to lower the entry barrier via its clearly superior tools, compilers and analysers.
"Overall, personally I think we both win. Our decision to architect for the 'more difficult' platform paid off almost immediately. The whole game was ported to 360 in 19 working days, although they weren't eight-hour days..."
"We went on the route to stream these resources from DVD, up to the extreme that we don't preload anything, not even the basic sounds like footsteps or weapon sounds. We've done a lot of work to compensate for DVD-seek latency, so the player should never notice it. That was the hard part."
All of these optimisations mean that the PC version of Metro 2033 benefits too.
"We don't need as much system memory as other PC-only games. Anything above 512MB RAM with DX10/DX11 code-path on Win7 would be enough," Shishkovtsov says. "DirectX 9 uses system memory backing store for almost all GPU resources, so you should add around 256 MB to avoid page-file swapping.
"The CPU side is slightly more problematic. Because the system is heavily multi-threaded, we need at least two hardware threads for 'smooth' gameplay. The CPU performance doesn't matter that much, except on a few selected scenes during the whole game as long as it is relatively modern architecture (not Intel Atom!) and has more than one core."
Graphics-wise, the PC version of the 4A engine is far removed from the console versions. All too often we've seen PC games that are identical to the 360 equivalents, simply offering you the ability to run at higher resolutions with higher frame-rates.
Metro 2033 features superior volumetric fog, double the precision in the PhysX, 2048x2048 textures (up against 1024x1024 on console), better shadow-map definition and filtering, object blur in DX10, sub-surface scattering for superior skin shaders, parallax mapping on all surfaces and better geometric detail with less aggressive LODs.
There's also going to be support for tessellation in DirectX 11. In basic terms, tessellation interpolates new polygons, so the closer you get to a tessellated object, the more polygons are generated.
Sample shots of the PC version of Metro 2033 running under DirectX 11, showcasing the tessellation and some of the additional effects gamers running cutting-edge graphics hardware enjoy.
DirectX 11 is a huge leap over its predecessor and its many capabilities look set to be embraced by games developers, including 4A.
"I really enjoy three things: compute shaders, tessellation shaders and draw/create contexts separation," Shishkovtsov says. "The major thing that can up the performance is the compute shaders. Today, games spend the majority of the frame doing the various kinds of post-processing. The easy route to extract some performance is to rewrite that post-processing via compute.
"Even the simple blurs can be almost twice as fast. For example we've rewritten our depth-of-field code, to greatly enhance quality while still maintaining playable frame-rate. [In Metro 2033] all the 'organic' things like humans are tessellated, and monsters use real displacement mapping, to greatly enhance visuals."
Looking at the engine spec published on Digital Foundry yesterday, there are many similarities in terms of technologies with Guerrilla Games' epic Killzone 2: pretty much the standard other developers have to aspire to in the console realm when it comes to first-person shooters. Guerrilla's engine is quite remarkable, geared completely to the specific hardware strengths of the Sony platform, but Shishkovtsov evaluates its performance from a different perspective.
"Their implementation seems to be badly optimised," he observes. "Otherwise why do they have pre-calculated light-mapping? Why do they light dynamic stuff differently to the rest of the world with light-probe similar stuff? From our experience you need at least 150 full-fledged light-sources per frame to have indoor environments look good and natural, and many more to highlight such things like eyes, etc. It seems they just missed that performance target."
Playing Metro 2033, the huge amount of light sources rendered with the deferred shading pipeline definitely feels reminiscent of the Sony shooter, which uses its own deferred rendering solution.
"Speaking from the Metro 2033 perspective, it was an easy choice," Shishkovtsov replies. "The player spends more than half of the game under the ground. That means deep dark tunnels and poorly-lit rooms. There are no electricity sources apart from the generators. From the engine perspective - to make it visually interesting, convincing and thrilling - we needed a huge amount of rather small local light sources. Deferred lighting is the perfect choice."
An impressive lighting model is one thing, but light needs to be accompanied by shadow in order to carry off a realistic look. Both HD consoles on the market appear to struggle with truly convincing shadows.
"I don't think we do anything unusual here," Shishkovtsov says. "On 360 we first render the traditional depth from light point of view, then convert it into a ESM (exponential shadow map) representation while gauss-blurring it at the same time. Later during the lighting we do one bilinear lookup to get percentage in shadow.
"The end result: we avoid any jittering, noise, stipple-patterns or many (costly) look-ups to filter shadow to get something what at least remotely looks like a shadow. Of course the 10MB eDRAM on 360 slightly limits the resolution of shadow maps, which can be noticed sometimes when the light source moves... We use that space for shadow mapping only twice during a frame."
The 4A engine also includes custom anti-aliasing solutions. Developers are finding that the MSAA hardware within the 360 GPU can be repurposed for other tasks, but reducing edge-aliasing and shimmer remains an important aspect of overall image quality.
"The 360 was running deferred rotated grid super-sampling for the last two years, but later we switched it to use analytical anti-aliasing (AAA)," reveals Shishkovtsov. "That gave us back around 11MB of memory and dropped AA GPU load from a variable 2.5-3.0 ms to constant 1.4ms. The quality is quite comparable. The AAA works slightly different from how you assume. It doesn't have explicit edge detection.
"The closest explanation of the technique I can imagine would be that the shader internally doubles the resolution of the picture using pattern/shape detection (similar to morphological AA) and then scales it back to original resolution producing the anti-aliased version. Because the window of pattern detection is fixed and rather small in GPU implementation, the quality is slightly worse for near-vertical or near-horizontal edges than for example MSAA."
Custom anti-aliasing techniques are used in the 4A engine in order to lower memory usage and increase overall performance.
Another key element of the 4A tech is the artificial intelligence of the NPCs. Impressive graphics don't count for much if your gameplay opponents exhibit poor intelligence.
"Each AI character in the game has feelings: vision, hearing and hit reaction. The vision model is pretty much close to reality: NPCs have a 120 degrees visibility cone and see those in the centre of the cone more clearly, also illumination and speed of the target is taken into account. For instance, a moving object is seen more clearly in the darkness than standing one. Also a 'look closely' effect is implemented. There are different levels of alertness: light disturbance, light alert, alert, uber-alert, danger."
The sound model for the AI is intriguing. The 4A engine attempts to emulate a real perception of hearing by drawing out variables from elsewhere in the game design.
"Each sound in the game has its own 'AI mark'... shooting sounds are marked 'combat.shot'," Shishkovtsov explains. "For this mark, hearing distance is, for example 50 metres, which is quite a lot. But using the renderer's portals/sectors the system hearing handler determines 'virtual distance', taking into account walls and corridors.
"So an NPC on the other side of the wall will never hear what's going on here, because while the 'straight line' distance is only five metres, the 'virtual distance' using a sound path along the wall results in a 60-metre distance."
Hit reactions and perception of objects in the view of the NPC are also processed. If the AI recognises a grenade, it'll try to make its escape.
"The next layer is used to sort out this basic information and decide, what is the most important for NPC right now," continues Shishkovtsov. "Different levels of feeling are connected to different types of behaviour. For instance typical behavior for a 'light disturbance' is saying something like 'who's there?' and looking closer, whereas for the 'uber-alert' it’s going out for a full search.
"And of course, designers have full control over everything, so they can still make NPCs stand still or play funny animations even when a nuclear bomb is dropped nearby if it suits the scene."
As an example of a fledgling game engine, 4A does an impressive job of utilising the Xbox 360 hardware: pumping out visuals quite unlike anything else seen on the system. While the console perhaps has too many first-person shooters, the tech combined with the distinctly East European art direction has resulted in a title that looks and feels different from the Unreal Engine norm. It's interesting to see how the team's "coding to the metal" mentality has been applied to the consoles.
"The 360 GPU is a different beast. Compared to today's high-end PC hardware it is 5-10 times slower depending on what you do," says Shishkovtsov. "But performance of hardware is only one side of equation. Because we as programmers can optimise for the specific GPU we can reach nearly 100 per cent utilisation of all the sub-units.
"That's just not possible on a PC. In addition to this we can do dirty MSAA tricks, like treating some surfaces as multi-sampled (for example hi-stencil masking the light-influence does that), or rendering multi-sampled shadowmaps, and then sampling correct sub-pixel values because we know exactly what pattern and what positions sub-samples have, etc."
It's this approach that will see the Xbox 360 and PlayStation 3 far out-live the shelf lives of their individual processing components.
"The majority of our Metro 2033 game runs at 40 to 50 frames per second, if we disable v-sync on 360," says Shishkovtsov. "The majority of the levels have more than 100MB heap space left unused. That means we under-utilised the hardware a bit."
The complete transcript of our interview with 4A's Oles Shishkovtsov will be published next week. There's a wealth of cool stuff in there, including a direct comparison between the 360's Xenon CPU and the latest Intel i7 architecture. Plus: more information on 4A's HDR lighting solution, the in-game AI, the utilisation of PhysX and much more.