The headline specification is undoubtedly the enabling of the 16th and final SM (Streaming Multiprocessor, or ‘stream processor cluster’ in neutral terminology) of the GF100 Fermi design.
However, the new GF110 design still uses the 32 stream processors per SM layout of the original GF100 rather than the 48 per SM of the GeForce GTX 460 GPU.
Even Nvidia sees the 32 layout as a less efficient design, but we suspect that it’s not possible to get four GPCs (Graphics Processing Clusters) with four SMs each onto a die small enough to actually make if each SM contained 48 stream processors rather than 32. In the end, brute force wins out.
Perhaps this layout will change with TSMC’s 28nm process, but that’s not due until halfway through 2011, with GPUs based on this process (from ATI and Nvidia) pencilled in for the autumn of that year.
As well as the extra resources, and the increased high-precision fp16 capabilities (which Nvidia claims is worth a 4-12 per cent performance increase), the GeForce GTX 580 1.5GB operates at higher frequencies than the GeForce GTX 480 1.5GB. While the GPU core of the latter runs at 700MHz (meaning that its 480 stream processors operate at 1.4GHz) the GPU core of the GeForce GTX 580 1.5GB runs at 772MHz, with its 512 stream processors clocked at 1.544GHz.
The 1.5GB of GDDR5 memory also runs faster, with an effective frequency of 4.08GHz rather than 3.7GHz. While this gives the GTX 580 1.5GB more memory bandwidth, the rest of the GPU is the same, with the same 384-bit memory interface and 48 ROPs. The reason for the rise in texture units (from 60 to 64) is because each SM of the GF100 design contains four texture units - unlocking the 16th SM unlocked four more textures.