Threading, Texturing, ROPs & Memory Bandwidth:
Most of the rest of the shader core remains the same – the Ultra Threaded Dispatch processor still assigns instructions in the same way,on a per SIMD basis. This means that there are still two arbiters and two sequencers per SIMD, along with separate dedicated arbiters and sequencers for Texture Fetching and Vertex Fetching.
There are also still four texture unit clusters which can each filter four textures per clock, making a total of 16 texture filters per clock cycle. These still operate completely independently from the four SIMDs and, depending on the usage scenario the GPU finds itself in, each texture sampler can be assigned to any one of the four SIMDs.
The render back-ends (or ROPs) are similar in functionality and still don’t feature a hardware based MSAA resolve unit, which means that, like the Radeon HD 2900 XT, anti-aliasing performance isn’t going to be as good as it should be in the more generalised scenarios.
RV670's render back-ends are virtually the
same as R600's
AMD has made some enhancements to this portion of the GPU to make better use of the memory bandwidth available. Additionally, we were also told that cache efficiency was improved, along with tweaking the size of on-chip buffers and caches to further hide latency.
The reason why AMD had to do this is because, in order to keep the cost (and thus die size) of RV670 down, some compromises had to be made. AMD’s Richard Huddy, head of worldwide developer relations, told bit-tech
that R600’s 512-bit external/1024-bit internal bi-directional ring bus memory controller made up around 30 percent of the chip’s total transistor count, which equates to 210 million transistors.
In comparison, RV670 uses a 256-bit external/512-bit internal ring bus memory controller, which is just half the number of wires used on R600’s memory controller. This saved AMD around 100 million transistors, which is not an insignificant amount in the grand scheme of things. The RV670 memory controller shares the same characteristics as the one in R600, in that there are still five ring stops (four for on-board memory and one for the PCI-Express channel) and eight independent memory channels. This time, however, they are only 32-bits wide instead of the 64-bit wide channels in R600.
One of the features that AMD has made a lot of noise about in the run up to the launch of the Radeon HD 3800 series was the introduction of ATI PowerPlay – a technology that has previously only been available on ATI’s notebook graphics solutions. For those not familiar with the technology, allow me to elaborate a little: it introduces a range of power states that depend on the GPU’s activity levels and works in much the same way as AMD Cool’n’Quiet or Intel SpeedStep technology.
Click to enlarge
The GPU’s utilisation is handled by a microcontroller which monitors the command buffer. When GPU utilisation is low, most of the 3D portions of the GPU are turned off, the core is clocked down and the operating voltage is lowered. When there’s a lot of GPU activity—like when you’re in a game—the chip runs at full speed.
AMD talked about an intermediate mode, known as “Light Gaming”, whereby the GPU isn’t the limitation in the gaming experience. For example, if you’re playing an old game like Quake 3
that’s now heavily CPU-limited, the GPU won’t be firing on all cylinders – instead, it’ll turn the required portions of the GPU on, and leave the rest in an idle state.
This results in some potentially massive power savings when you’re not using all of your GPU’s capabilities – AMD claims that when Radeon HD 3870 is running at 100 percent load, it’ll consume around 105-106W of power, while when it’s in “Light Gaming” mode, it’ll consume just 51W of power. Meanwhile, when you’re running your desktop applications, the GPU will idle along consuming just 34W of power.