AMD Mantle - Battlefield 4 Performance
As we reported earlier this week, the past weekend saw the release of both AMD's Mantle-enabled Catalyst drivers and the Mantle patch for Battlefield 4, so users with AMD Graphics Core Next (GCN) hardware at their disposal are finally free to experience the overdue API. As well as being the first HSA-enabled graphics drivers from AMD, the new drivers also bring with them phase 2 of the firm's frame pacing improvements for older GCN improvements. A Mantle-compatible technical demo called StarSwarm has also been released on Steam, but here we're solely focussing on the performance impact of Mantle in Battlefield 4 on a number of set-ups as well as briefly outlining what it is and what it sets out to achieve.
The Catalyst 14.1 drivers are still very much a beta product – AMD has been keen to emphasise this and is also aware of a variety of current performance and stability issues. Both AMD and DICE are working on further optimisations for the drivers and Battlefield 4 respectively. It's also worth noting that currently Mantle is only supported on 64-bit editions of Windows, though we can't imagine this being a hindrance to many.
Mantle is described as a low-level graphics API (Application Programming Interface). Fundamentally, an API is a series of abstraction layers that allows software like game engines to access and make use of the hardware components on hand. The low-level aspect refers to the fact that Mantle is designed to give developers better access to the hardware and its capabilities by lessening the levels of abstraction and the processing overhead that they bring.
The biggest bottleneck Mantle seeks to overcome is that of CPU draw calls, which essentially feed the GPU cores information about what to render, and there can be thousands needed for a single frame. Draw calls are costly in that they require information to pass through the numerous abstraction layers in order for the GPU to process them. As a low-level API, Mantle can make this process more efficient, especially as it's also graced with improved utilisation of multiple cores and threads.
The tradeoff is that for now, at least, Mantle is limited to compatibility with AMD's own GCN-based hardware. Traditional APIs like Direct3D (the graphics portion of DirectX) and OpenGL need higher (and less efficient) levels of abstraction to ensure functionality with a much more varied and broader set of hardware. That said, Mantle is designed to be potentially applicable to other graphics architectures. AMD is planning the release of a public SDK and hopes to see it, or something closely related to it, become something of an industry standard.
This may seem like wishful thinking, and perhaps it is, but AMD has a couple of strategies that may pay off. While Battlefield 4 makes sense as a launch platform given the game's massive popularity with PC gamers, the association with DICE isn't only important for that reason. DICE is also responsible for the Frostbite 3 engine, which isn't limited to Battlefield 4. Unlike StarSwarm's Nitrous engine, Frostbite 3 isn't built specifically to be Mantle-optimised, but by investing in optimisations for engines now, AMD is likely to have a clearer path towards Mantle support in games where developers choose to utilise such middleware packages – building your own engine is after all extremely difficult, time consuming and expensive.
Secondly, AMD's GCN hardware is in both next-gen consoles. Mantle is a PC platform only product, but it seeks to emulate the low-level programming environment that developers are used to in the console space. While it doesn't make porting from console to PC as simple as copy and paste, a blog post from AMD suggests that Mantle could well make the transition easier and thus less expensive, since it is 'similar to, and often compatible with, the code [developers] are already writing for those platforms'.
Coming back to the here and now, however, Mantle is predominantly designed to benefit CPU-limited scenarios – AMD itself has admitted that boosting performance in GPU-limited sequences is difficult at the API level, though some minor optimisations have been made. This is important to bear in mind, as hardware enthusiasts with the latest high end cards and a relatively beefy CPU are only likely to see limited improvements. Also, the optimisations are focussed on GCN 1.1 hardware (the R9 290 series, R7 260X and Kaveri APUs), and it's these products we'll be focussing our testing on. As we said, it's early days, and AMD is still investigating and working on performance boosts for GCN 1.0 hardware (e.g. the HD 7000 series).