Although multi-threading is not a new part of the graphics pipeline, per se, it is an incredibly important feature for DirectX 11. It becomes even more important when you factor in the potential to port these improvements to DX10-class hardware as well through driver updates.
Dual-core CPUs are mainstream today and quad-core is now a relatively inexpensive option for gamers and enthusiasts alike – in the future, quad-core will replace the dual-core as the de-facto mainstream processor. When you think about it, it’s actually quite surprising that DirectX still isn’t multi-threaded. To make up for the omission, both AMD and Nvidia have worked on multi-threaded drivers, but the success has been limited because the API is ultimately single threaded.
We’ve spoken to a number of developers about multi-threading – some have come up with ways to utilise the increased core count, while others have struggled to extract more performance and have often left those additional cores sitting idle. That’s becoming less and less of a problem nowadays as developers have started to think about threading, but there are still scenarios where the application is incredibly CPU limited.
Thankfully, this will change with DirectX 11 and Microsoft will make these benefits available to DirectX 10 class hardware. AMD’s and Nvidia’s respective driver teams will need to do some work to implement this into their drivers, but since they’ll already be doing the work for DX11 hardware, it doesn’t seem like much of a stretch to add support for DX10 GPUs as well.
The way Microsoft achieves this is by splitting a Direct3D device down into three separate interfaces: Device, Immediate Context and Deferred Context. Each one is assigned to a thread and with both Device and Deferred Context interfaces, there can be more than one thread assigned to queuing up tasks for the Immediate Context or Render thread.
Switching between threads is said to be fine grained, so the developer should be able to manually decide how and in what order operations are queued up for the Immediate Context interface. Each Device interface can load thread resources as and when it needs to, while the Deferred Context interface acts as a per-thread device context for future rendering operations – it queues up draw calls (or Display Lists) before passing them onto the Immediate Context interface when it’s ready for them.
For DirectX 10 class hardware, Deferred Context interfaces will need to be implemented in software instead of hardware as there are some new hardware-based optimisations for multi-threading. Because of this, Deferred Context interfaces will not be free-threaded on DX10 hardware and a dedicated thread will have to be allocated to Deferred Contexts at the API level.