The DirectX Performance Overhead
So what sort of performance-overhead are we talking about here? Is DirectX really that big a barrier to high-speed PC gaming? This, of course, depends on the nature of the game you're developing.
'It can vary from almost nothing at all to a huge overhead,' says Huddy. 'If you're just rendering a screen full of pixels which are not terribly complicated, then typically a PC will do just as good a job as a console. These days we have so much horsepower on PCs that on high-resolutions you see some pretty extraordinary-looking PC games, but one of the things that you don't see in PC gaming inside the software architecture is the kind of stuff that we see on consoles all the time.
On consoles, you can draw maybe 10,000 or 20,000 chunks of geometry in a frame, and you can do that at 30-60fps. On a PC, you can't typically draw more than 2-3,000 without getting into trouble with performance, and that's quite surprising - the PC can actually show you only a tenth of the performance if you need a separate batch for each draw call.
DirectX supports instancing, meaning that several trees can be drawn as easily as a single tree. However, Huddy says this isn't still enough to compete with the number of draw calls possible on consoles
Now the PC software architecture – DirectX – has been kind of bent into shape to try to accommodate more and more of the batch calls in a sneaky kind of way. There are the multi-threaded display lists, which come up in DirectX 11 – that helps, but unsurprisingly it only gives you a factor of two at the very best, from what we've seen. And we also support instancing, which means that if you're going to draw a crate, you can actually draw ten crates just as fast as far as DirectX is concerned.
But it's still very hard to throw tremendous variety into a PC game. If you want each of your draw calls to be a bit different, then you can't get over about 2-3,000 draw calls typically - and certainly a maximum amount of 5,000. Games developers definitely have a need for that. Console games often use 10-20,000 draw calls per frame, and that's an easier way to let the artist's vision shine through.'
Of course, the ability to program direct-to-metal (directly to the hardware, rather than going through a standardised software API) is a no-brainer when it comes to consoles, particularly when they're nearing the end of their lifespan. When a console is first launched, you'll want an API so that you can develop good-looking and stable games quickly, but it makes sense to go direct-to-metal towards the end of the console's life, when you're looking to squeeze out as much performance as possible.
Programming direct-to-metal would require more in the way of optimising for different GPU architectures, such as Nvidia's GF100 scalar architecture
Consoles also have a major bonus over PCs here, which is their fixed architecture. If you program direct-to-metal on the PlayStation 3's GPU, then you know your code will work on every PS3. The same can't be said on the PC, where we have numerous different GPU architectures from different manufacturers that work in different ways.
For example, developers may ideally need to vectorise their code for it to run optimally on an AMD GPU's stream processor clusters, or maybe tell the GPU's stream processor clusters to split up their units into combinations of vector and scalar units. Conversely, developers will ideally need to program for a scalar architecture on Nvidia GPUs. Once you remove the API software layer, you suddenly have to really start thinking about the differences between GPU architectures.