Upgraded Stream Processors
The four stream processors are the red units,
with the Branch Unit to the left and
the General Purpose register below.
The doubled-up capability of Cayman’s front-end has been twinned with a new stream processor layout. Even the HD 6800-series Bart GPU used the familiar VLIW5 (Very Long Instruction Word 5-way) which arranged the stream processors in groups of five-wide units.
Each of these units contained one ‘T-Unit’ super-stream processor to handle double-precision calculations and similarly long and high-precision work. ATI arranged 16 of these groups into each SIMD (Single Input, Multiple Dispatch) Engine, of which the Radeon HD 5870 1GB had 20 to give it its 1,600 stream processor count.
The Cayman GPU uses a new VLIW4 design, meaning that it groups its stream processors in fours, for 4-way co-issue. However, ATI has ditched the idea of the T-Unit as all four of these VLIW4 stream processors have the same abilities. This means that the double-precision rate of the HD 6900 is higher than that of previous ATI GPUs – ATI claims it’s one quarter the speed of the single-precision rate design.
Bizarrely, this new arrangement has meant that the new top-end HD 6970 2GB and its dual Front-End Engines actually has fewer stream processors than the Radeon HD 5870 1GB – only 1,536 rather than 1,600. However, ATI says that the VLIW4 layout delivers 10 per cent more speed per mm2
, and that the simpler ‘all smart’ stream processor capabilities makes for simpler register and scheduling management. The top-end HD 6970 has 24 SIMD Engines for its total of 1,536 stream processors, while the lesser HD 6950 has 22 SIMD Engines for 1,408 stream processors.
Click to enlarge
New ROPs and Anti-Aliasing
As if doubling the front-end capabilities of the Cayman GPU while radically overhauling the stream processor layout wasn’t enough, ATI has also upgraded the ROPs at the back-end of the GPU. It’s not entirely clear whether the upgrades are just to enable yet another new form of AA or to also boost the performance of current, standard AA techniques. However, ATI claims that write operations can be coalesced, that 16-bit integer operations are two times faster on a HD 6900 than previous GPUs, and that 32-bit floating point operations are 2-4 times faster.
The memory interface is still 256-bit wide, though the GDDR5 memory of the two HD 6900-series card runs comparably fast to deliver plenty of memory bandwidth. The 5.5GHz (effective) memory of the HD 6970 2GB gives it 176GB/sec of memory bandwidth, while the HD 6850 2GB has 160GB/sec of memory bandwidth thanks to its 5GHz (effective) memory. Both cards have 2GB of memory, which is great news for memory-intensive games, high-resolution gaming and using shed-loads of AA.
Click to enlarge
The new AA technique is grandly titled Enhance Quality Anti-Aliasing, and it promises improved AA quality with very little penalty. EQAA is said to have such a small impact on performance because it takes twice as many coverage samples as normal, but deletes most of these values once it’s worked out which is the most accurate and saves only that.
This cuts down on the amount of data being processed and on the stress placed on the memory interface, but delivers better data to work with.
While EQAA is only available on the HD 6900-series, it also supports the Morphological AA technique that was introduced with the HD 6800-series. Oddly, ATi is called Morphological AA ‘MLAA’, and its recommending using MLAA in conjunction with
other AA techniques.
For example, 4x MSAA is very good at picking up fine lines that aren’t quite one pixel wide (as it’s sampling four points within each pixel). ATI says that even 4xMSAA produces some jagged edges, which MLAA can smooth out.