The Problems with DX9 to be solved:
As with previous versions of the API, development of DirectX 10 was not done solely by Microsoft. Microsoft worked with game/application developers, hardware vendors (like ATI/AMD and NVIDIA) and its own team of DirectX runtime/API architects. During the development of DirectX 10, developers highlighted many of the areas that have limited their own games development.
The following list highlights the key components that Microsoft would address with the clean slate development method it employed in the development of the new API.
High API Overhead (The Small Batch Problem):
Every time a command is issued to the API from the application, DirectX has to talk to the display driver and process the command so that the hardware can understand how to execute it. This in turn limits the number of objects that can be rendered along with the number of unique effects that can be applied to a scene.
The number of objects that can be rendered is limited by the number of draw calls the API can handle, because each draw call carries a fixed overhead. In DirectX 9.0, you could render around 500 different objects in any one frame before you become completely CPU limited. Objects can be things like trees, non-playing characters, characters, guns, buildings, etc.
If you look closely at a game developed on DirectX 9.0, you’ll notice that similar objects (for example, look at the trees, as it’s probably the most noticeable) are either the same, or have very few variants. This is because the developer has to balance the game in order to not be completely CPU limited. Instead of making the trees look completely different, the developer would have changed the colour slightly, or maybe changed the number of leaves on each tree.
The number of unique effects is limited by the number of state changes that the API can handle – again, it’s all about balancing the game so as to not be CPU limited. To create unique effects, the developer is changing the state of vertex formats, textures, shaders, shader parameters or blending modes – these all suffer from a high API overhead.
Large Variations in Hardware Capabilities:
Along with the high API overheads, developers have cursed hardware manufacturers because of the optional hardware capabilities (known as caps bits) that required GPU detection at the application level. The differing hardware capabilities meant that the developer needed to create several GPU-specific code paths in addition to the one minimum-common-feature-set path that covers older hardware – this obviously takes a massive amount of time to code.
Fixed Function Hardware Limitations:
The caps bits are only part of the problem with DirectX 9.0 when it comes to hardware implementations. The second problem is that the each portion of the traditional graphics pipeline is separate and each portion is designed to handle specific tasks. These limitations can often lead to performance bottlenecks and an inefficient design when considering performance-per-watt or performance-per-mm².
More importantly, the limitations prevent a developer from expressing his or herself fully when designing 3D environments. There are many situations where 3D environments in games are incredibly vertex or pixel bound. In that instance, either your pixel shaders or vertex shaders sit idle, using up die space and power that someone has to pay for.
The final key drawback that Microsoft wanted to address was the resource limitations that have hampered DirectX 9.0 games development to some extent. Resources for constructing 3D environments include things like the number of dependent texture reads, bound textures, programme instructions. In the past, DirectX has been quite light on that front and on occasions, it has hampered games development because algorithms had to be scaled back or they had to be split into multiple passes through the graphics pipeline – this introduces overhead.
In DirectX 9.0, validation of resources is done on a per use basis before the resources are processed and drawn. Validation is completed before a draw call is executed to ensure that commands and data sent by the application are formatted correctly to avoid data integrity problems. This is partly related to the caps bits and differing hardware capabilities that have been a problematic part of the API ever since hardware manufacturers started to do things incredibly different from one another. It represents another unneeded overhead because each resource validation has to be completed on the CPU, resulting in even more CPU limitation headaches for the developer to deal with.
Microsoft has worked hard to improve performance and scalability with the DirectX 10 runtime by focusing on one main goal – making programmer’s lives easier. Making the developer’s life easier has a number of benefits, the main one being that they can focus more of their time on making sure they make great games, instead of trying to ensure their games support a wide range of hardware. To do this, Microsoft has attacked the issues above with a three-pronged strategy.
Lower CPU Overheads:
Firstly, the architects have worked to alleviate overhead issues by redesigning the performance critical portions of the API. As a result, CPU overhead should no longer be a problem with DirectX 10 class hardware when it is running on a DirectX 10 code path. In part, this is down to the new driver model (WDDM) and how as much of the driver is in the user mode as possible, but it’s not the only thing that Microsoft has changed.
Because of the API redesign, the cost of draw calls and state changes has been massively reduced – the architects have achieved this by implementing some new features that result in less CPU intervention. These features include texture arrays, predicated draw and stream out – we will come to these shortly.
The final reduction in CPU overhead was achieved during the resource validation stage. In DirectX 9, resources have to be validated every time an object is used in a frame (i.e. millions of times), while resource validation is completed when the objects are created in DirectX 10. Objects are only created once, meaning that resource validation only needs to occur once too, resulting in a huge overhead reduction.
Fixed Hardware Capabilities & Unified Programming Model:
Microsoft was ready to make a significant leap forwards in programmability with DirectX 10, but in order to do so, it had to lay down a strict set of specifications for hardware vendors to follow. Without caps bits and the need for vendor specific code paths, programmer’s lives are going to become much easier, so that they can focus on making more enjoyable games.
Along with that, Microsoft felt that it was time to move away from the traditional fixed function graphics pipeline we have become accustomed to. In the eyes of game developers, this is a good thing – it will allow them to be more expressive since there will be very few GPU-related limitations imposed on them. If you think about why developers love developing games for consoles, you will be able to understand why these changes needed to be made.
A games console has a consistent hardware specification that doesn’t change – this means that developers know exactly what the capabilities of the hardware are, and can adapt their games to the platform’s capabilities. In the past, the PC (and more specifically the graphics hardware) hasn’t had a consistent set of capabilities because hardware vendors have been able to pick and choose which portions of the specification they want to implement into their hardware.