bit-tech.net

Legacy content from www.custompc.co.uk

Dissecting DirectX 10

Stuart Andrews takes a journey through a DirectX 10 3D graphics pipeline, and explains how GPU architecture has changed since the DirectX 9 days.

Before the advent of DirectX 9, ROPs were directly tied in to the pixel processors and texture units as part of a unified pixel pipeline. With DirectX 9 hardware, however, there was a move to decouple the ROPs from the pixel processors. For the R580, for example, you needed just one ROP to handle the output of every three pixel units (therefore 16 ROPs for 48 pixel units).

As with the diminishing numbers of texture units, R600 takes this trend one stage further, with only four ROPs, although each can output four pixels per clock. The units can also perform twice as many stencil/depth tests per clock cycle as the ROPs in the R580, and both Z-buffer and stencil-buffer efficiency, and AA sampling efficiency have been improved dramatically. However, the entire calculation isn't carried out within the ROP, as it is in R580 or Nvidia's G80, but by the shader cores. ATi is taking a gamble on DirectX 10 game developers moving from current 'box-filter' AA, where the GPU takes four samples and averages them, to more sophisticated custom AA shader routines that look at pixel values outside of the box and factor those into the calculations.

This approach makes sense for future applications. Richard Huddy explains, 'This sometimes gives you a blurrier image but, in the vast majority of cases, it looks crisper. You can get rid of some of what we call the "roping" artefacts.' Not only that, but it also opens up the way for programmers to create custom AA routines that work specifically for the environments they're creating in their games. 'You really will have situations where anti-aliasing is game specific,' Huddy adds. What's more, this Custom Filter Anti-Aliasing (CFAA) becomes more important in games that use HDR, since if there are pixel values that have to be carefully mapped to create the right impression of ultra-bright whites and rich darker shades then simply averaging four samples can create horrible effects. Nvidia has its own custom shader-based filtering routines, CSAA (Coverage Sample Anti-Aliasing) to avoid these.

The problem is that this doesn't suit current applications so well. The standard multisampling 4x AA routines built into the six ROPs in Nvidia's G80 (each of which can also output four pixels per clock) handle DirectX 9 AA routines more effectively, which is another reason R600 hasn't performed as well as expected in recent benchmarks. 'We believe that our architecture is more balanced, and we can more effectively process a wider range of current and future shader code,' says Nvidia's Nick Stam. 'We wouldn't want the back-end processing done by ROPs to cause a bottleneck, so we provide adequate ROP resources to carry out the final pixel processing and frame buffer blending, so we have a very balanced design.'

Subscribe to Custom PC