bit-tech.net

Legacy content from www.custompc.co.uk

Dissecting DirectX 10

Stuart Andrews takes a journey through a DirectX 10 3D graphics pipeline, and explains how GPU architecture has changed since the DirectX 9 days.

Once all the operations on a particular set of vertices are complete, the setup engine goes to work, calculating which vertices belong to which triangles, and how those triangles need to be drawn from the given perspective. It will also work out which triangles need to be rasterized, and which need to be discarded.

This part of the process is similar to the method used in a DirectX 9 GPU, but DirectX 10 GPUs are more efficient, particularly when it comes to culling or clipping unwanted triangles. For instance, in any given scene, half of the polygons that could be drawn will be back-facing (they don't face the camera, so they wouldn't be seen if they were rendered). Rasterizing these isn't only a waste of processing power and, therefore, a problem for performance; leaving them in the pipeline can also cause unpleasant image artefacts later on. The setup engine in the R600 can cull an impressive 1 million triangles with every clock cycle.

Texturing

The output from the setup engine is then rasterized. The setup engine transforms the primitives into pixel 'fragments', which have to be coloured and shaded. In the linear DirectX 9 process, as used in the R580, this meant simply cacheing the fragment data and sending instructions to the dispatch processor that controls the 48 pixel units. In the R600, however, it means cacheing the data so that it's ready for the dispatch processor to use it in the pixel queue. In the G80, the instructions go to the global scheduler, and then to the local scheduler belonging to the ALU cluster.

While unified shaders are a key part of DirectX 10 architecture, they aren't the only important part when it comes to texturing and pixel shading operations - we also have to consider the texture units. Nearly all 3D graphic operations involve sampling textures, held in the main system or video RAM, and then cached in one or more texture caches in the GPU. Moving sample data from the cached textures and into the pixel shader requiring it is the texture unit's first task. Filtering these textures - ensuring that they appear at the correct angle and blur level for the distance and perspective - is its second task. In both respects, if you compare the R600 with the old R580, two things stand out.

Firstly, while old-school DirectX 9 GPUs such as the R580 had already taken the step of decoupling texture units from specific pixel units, sharing them three to one across the 48 processors, the R600 takes this process one stage further. Texture units are now, in Richard Huddy's words, 'servants of the shader cores in a much more abstract sense'. Any shader core, whether it's performing pixel, vertex or geometry work, can request the resources of a texture unit, and the dispatch processor handles the transaction. Interestingly, the texture units in Nvidia's G80 are tied to specific SPU clusters, although they can be addressed by any form of shader program running on that cluster.

Subscribe to Custom PC