bit-tech.net

Radeon HD 2600 XT vs. GeForce 8600 GT

Radeon HD 2600 XT architecture:

Before we move onto looking at the actual hardware, we’re going to run over the technology behind the cards briefly. In the case of the Radeon HD 2600 XT GDDR4, it’s based on the RV630 graphics processing unit which is the middle GPU in AMD’s trio of DirectX 10-class parts.

RV630 is manufactured on a new and power-efficient 65nm process at TSMC that means we’ll see quite a few passively cooled Radeon HD 2600-series cards. In fact, we’ve recently had a couple of passively-cooled Radeon HD 2600 XTs to give an idea of how energy efficient the new process is.

Obviously, being a more mainstream-orientated GPU, the chip is a cut down version of AMD’s flagship R600 graphics processor – as a result of the diet, RV630 weighs in at 390 million transistors, which is around 55 percent of the 700 million transistor behemoth that is R600.

The diet doesn’t prevent RV630 from sharing the same design philosophy as R600 and uses the same unified superscalar architecture. Since that’s the case, we’re not going to go into tremendous depth on RV630’s architecture – if you want that, please head on over to read our original Radeon HD 2900 XT review, which covers R600’s architecture in more depth.

Radeon HD 2600 XT vs. GeForce 8600 GT Radeon HD 2600 XT architecture
The Radeon HD 2600 XT (RV630) block diagram

R600’s superscalar shader processors were arranged into four clusters of 16 shaders (or 80 stream processors) for a total of 64 five-way superscalar units or 320 stream processors. As a result of RV630 being more mainstream orientated, the layout is changed slightly – there are now three clusters of eight shaders (or 40 stream processors), making a total of 24 shader units or 120 stream processors.

While AMD has played jiggery-pokery with the arrangement of the shader units, the chip’s engineers haven’t changed the functionality of RV630’s superscalar shader processors – there are just fewer of them. Therefore, they can co-issue five FP MAD instructions per clock with 32-bit floating point precision. There’s also a branch execution unit in each superscalar shader processor too, which handles flow control and conditional operations.

Despite the reduced number of shader clusters, AMD has not changed the Ultra-Threaded Dispatch Processor’s capabilities, in that it still has two sequencers and two arbiters per shader cluster. Because of the reduced number of shader units per cluster and thus the shallower graphics pipeline (in relative terms), RV630’s efficiency should be much higher than R600’s in certain situations, especially when executing shaders that use dynamic branching.

Compared to R600, the RV630’s texturing capabilities have been cut in half, meaning that there are now only two texture units. Like the shader processors though, there are no changes to the texture unit’s capabilities. Each unit can do eight texture addresses per clock – four are used for unfiltered lookups and the other four are used for bilinear lookups.

Additionally, each texture unit can fetch 20 FP32 textures for bilinear filtering and point sampling, and is also able to apply bilinear filtering to four FP16 textures every clock cycle – FP32 textures are bilinear filtered at half speed. AMD hasn’t touched both the vertex and L1 texture caches – they’re still 32KB in size, but the L2 shared texture cache has shrunk from 256KB to 128KB.

The render backends (or ROPs) have been significantly culled as well – there’s just a single ROP partition remaining, which can output four pixels per clock with colour and Z processing. The ROPs operate at double speed for Z-only tests, regardless of whether anti-aliasing is enabled or not, meaning RV630 can output eight pixels per clock in Z-only scenarios.

The Heir-Z buffer is also still present in RV630 and remains the same as the one in R600, as are the eight multiple render targets, which support all of the common anti-aliasing formats – RV630 supports the same modes as R600, meaning you’ll get 2x, 4x and 8x MSAA, custom filter anti-aliasing along with adaptive anti-aliasing too. All of these modes are supported with both FP16 and FP32 render targets, meaning there is support for up to 128-bit HDR with AA.