Ivy Bridge’s new, DX11 Graphics Processor
Intel is currently alternating between new designs and process shrinks when it comes to CPUs, with one being a tock and the other a tick. However, Ivy Bridge is a bit of both – Intel calls it a tick+, as while the main characteristic of the design is the move to 22nm 3D transistors
, the Processor Graphics unit has seen a major update.
For example, it’s now compatible with DX11, meaning that Intel will catch up with Nvidia and AMD’s support of this standard within three years.
Intel was very keen to talk about the vast improvements it’s made to its next generation of Processor Graphics though, and we’d be surprised if it slows its rapid pace of development in this area any time soon.
Intel Ivy Bridge Graphics Overview
Intel's new Processor Graphics unit
While we’re sure you know your VLIW4 from your CUDA Cores, we’ll forgive you a lack of knowledge in Intel’s graphics architecture and terminology. Intel doesn’t even draw its block diagrams like AMD and Nvidia, preferring a left-to-right workflow rather than a top-to-bottom.
Also, Intel refers to stream processor clusters as ‘Slices’ rather than SMs or SIMD Engines, and these group around a special ‘Slice Common’ unit, which houses the rasteriser, (new) Level 3 cache and the Pixel Back-End units. Oh, and a stream processor is called an Execution Unit (EU) in Intelian.
The aims of the update were to improve gaming performance, add DirectX 11 support, improve media performance (especially the Quick Sync Video encoding unit) and to add triple-output capabilities.
Don’t get excited about the last – it’s not IntelFinity, just triple-screen desktop support. It’s still quite nifty, though, as you can connect and use two external screens, even if you close the lid of the host laptop. This could be a useful feature on powerful docking stations with additional cooling and fast ThunderBolt connections to external expansion cards.
Ivy Bridge Processor Graphics Architecture
Intel describes the Processor Graphics of Ivy Bridge as a scalable architecture that will 'set the stage for further scale-up opportunities,
so Intel will probably make large jumps forward in graphics performance with each generation of CPU. There are five domains in the new Processor Graphics, but one is the screen-out manager which we’ve already talked about. Another is the fixed-function media unit
The Global Assets unit is the front end.
The Global Assets
unit is what we’d normally think of as a front-end unit, as it includes geometry front-end and setup capabilities. It’s also attached to the Ring Bus and processor-wide Level 3 cache, so it's the first stop in the Graphics Processor. It’s not a conventional front-end (just as with Nvidia’s Fermi design), as some typically front-end units have been moved elsewhere. The rasteriser is in the Slice Common, for example.
However, it’s where DirectCompute tasks are processed (which necessarily brings UAVs, atomics, barriers, shared local memory and so on into the Ivy Bridge Processor Graphics mix), and it’s where media is decomposed and broken into threads for the shader units, as well as 3D tasks. As well as DirectCompute, all the units for DirectX 11 support are found in this unit too – the Hull and Domain shaders and the support for the BC6H/7 texture compression format.
For GPGPU tasks, Intel senior fellow Tom Piazza told us that ‘the amount of scatter-gathers per clock is 32x what was in Sandy Bridge, so if you’re running GPGPU workloads, don’t be surprised if the performance is extremely high – at 32 per clock, we’ve seen 27 sustained and some workloads jump 20x from a performance standpoint.
Piazza also told us that the Global Assets unit is ‘much faster on all the geometry areas to keep the machine fed.
’ He claimed that the Ivy Bridge Processor Graphics unit was more able to hit a sustained level of performance that was closer to its peak, due to optimisations in how buffers are cleared (by scoreboarding), faster clearing of render targets (for faster Z performance), the increased thread (and register) count from the thread management engine in this unit, and the new ability to co-issue FMA (Floating-point Multiply-Adds).
The latter is said to offer a true doubling of performance in this area, as the Sandy Bridge Processor graphics relied on a secondary, single-issue pipeline alongside the EU to handle this work. Piazza also said that in anisotropic filtering tests ‘we now draw circles rather than petals
’, which is big improvement.