GeForce 8800 GT architecture:

Nvidia’s GeForce 8800 GT uses the same unified architecture that Nvidia unveiled when it first talked about its G80 graphics processing unit.

As we briefly mentioned on the previous page, G92 uses around 754 million transistors in its design and has been manufactured on TSMC’s 65nm process. The die size is around 330mm² and, although it’s appreciably smaller than a G80 die, it’s still a long way from being classed as a small piece of silicon.

There are a total of 112 1D scalar stream processors which, with the card in its standard configuration, run at 1,500MHz. These are arranged into seven clusters, each with 16 stream processors, and share eight texture address units, eight texture filtering units and its own independent cache.

This is the same configuration that Nvidia used on its G84 and G86 graphics chips on a shader cluster level, but obviously G92 is a much more complex GPU than either of these.

G92: Nvidia GeForce 8800 GT GeForce 8800 GT architectureEach of the shader processors can dual-issue a MADD and a MUL instruction in the same clock cycle and, being a unified shader architecture, all of the units can process all shader operations and calculations sent to the GPU in both floating point and integer form. Curiously though, despite the stream processors’ capabilities being the same as G80’s (aside from the quantity and clock speed), Nvidia says that the chip can deliver around 336 GigaFLOPS of compute power at peak.

However, if you were to calculate the dual-issue MADD and MUL, you’d get a throughput of 504 GigaFLOPS. We asked Nvidia to confirm whether this was a mistake in the documentation, or if the MUL was no longer there. We were told that the CUDA group does not like to count the MUL capability as part of the chip’s general throughput, as it’s fairly hard to utilise it. Instead, Nvidia has taken a conservative approach with G92’s computational power.

During our briefings and round-table discussions with Nvidia, the company’s representatives told us that some of the architectural enhancements made should allow this chip to get closer to its theoretical maximum throughput. There have been improvements made to the instruction issuer, which schedules and load-balances workloads moving through the pipeline.

G92: Nvidia GeForce 8800 GT GeForce 8800 GT architecture
Nvidia's G92 graphics chip

G92: Nvidia GeForce 8800 GT GeForce 8800 GT architectureNvidia announced a while ago that it would be supporting double precision with future graphics processing units, but when questioned, Nvidia’s Tony Tamasi said that the chip only supports double precision through emulation. He later added that this feature won’t be supported in hardware until it can be done with IEEE compliance, because it would be pretty pointless to support something like this without being compliant with industry standards.

G92’s ROP layout is similar to every other graphics chip in the GeForce 8 family, whereby each ROP partition has an L2 cache and is assigned to a 64-bit memory channel. There are a total of four ROP partitions in G92, which back out onto a 256-bit memory interface. Each ROP partition can each process four pixels per clock if four samples per pixel (RGB colour and Z) are taken and if the pixels are sampled with only a Z component, each ROP partition can process 32 pixels per clock.

The ROPs still support all of the common anti-aliasing formats found in previous GeForce 8-series GPUs – these include multi-sampling, super-sampling, transparency adaptive AA and coverage sampling AA (CSAA). As the chip features a 256-bit memory interface, Nvidia felt the need to make some improvements to the ROPs’ compression efficiency to help reduce the reliance on bandwidth and memory footprint when anti-aliasing is enabled at resolutions like 1600x1200 and 1920x1200.

As a derivative of the original G80 architecture, the texture address and texture filtering units, and also the ROP partitions operate at a different clock speed to the stream processors – this is what Nvidia calls the core speed. In the case of the GeForce 8800 GT reference design, this is set at 600MHz. Theoretically, this results in a pixel fillrate of 9,600 Gigapixels per second and a bilinear texture fillrate of 33.6 Gigatexels per second.

I have to wonder whether this is all that's inside the G92 graphics chip. First of all, the clock speeds are very conservative and the transistor count increase doesn't seem to warrant the features added and features removed. If we go back to when Nvidia moved from 110nm to 90nm at the high end, it managed to remove around 10 percent of the total transistor count in the shrink, thanks to some optimisations.

Therefore I wouldn't be surprised if there was at least another 16 stream processors that are disabled on this particular product. When we asked whether the chip had clusters of stream processors disabled, the response we got back from Nvidia's Tony Tamasi was "the GeForce 8800 GT is a 112 stream processor product". There are potentially more ROPs and support for up to 384-bit memory bus widths too, but this takes us into the realms of speculation.

However, at Nvidia Editor's Day October 2006 during my talk with Jen-Hsun Huang, Nvidia's CEO and President, he told me that Nvidia would move the monstrous G80 graphics chip to a smaller process as soon as it was possible. That makes me think that there's more to G92 than meets the eye initially - I guess you could say watch this space...

Shopping



Stats: 0.020 seconds