New GPGPU approach promises 20 per cent performance boost
February 8, 2012 // 12:37 p.m.
Researchers at North Carolina State University have provided some serious vindication for AMD's plan to unite GPU and CPU silicon using the Heterogeneous Systems Architecture: a 20 per cent performance boost without overclocking.
Before we get into the paper, entitled CPU-Assisted GPGPU on Fused CPU-GPU Architecture, there's a couple of things to get out of the way: while the researchers are independent, the research itself was part-funded by AMD while the company's senior fellow architect Mike Mantor is named as a co-author. The team also didn't have real silicon to work with: instead, their results are based on a simulated future AMD Accelerated Processing Unit (APU) featuring shared L3 cache.
With that out of the way, the team's results are still worthy of note. Using the aforementioned simulated silicon, the team were able to convince their code to run 20 per cent faster on average without overclocking the 'chip.'
'Our approach is to allow the GPU cores to execute computational functions, and have CPU cores pre-fetch the data the GPUs will need from off-chip main memory,' paper co-author and associate professor of electrical and computer engineering Huiyang Zhou explains. 'This is more efficient because it allows CPUs and GPUs to do what they are good at. GPUs are good at performing computations. CPUs are good at making decisions and flexible data retrieval.'
This approach, in which the CPU and GPU combine their efforts to boost overall performance, has previously been nigh-on impossible thanks to the separation between GPU and CPU in silicon. With AMD forging ahead with the architecture formerly known as Fusion, which bonds the two into a single cohesive whole, however, it becomes far simpler.
Using synthetic benchmarks, Zhou's team was able to show significant performance gains using the CPU-assisted GPU model. On average, benchmarks ran 21.4 per cent faster while some tasks were boosted by 113 per cent.
'Chip manufacturers are now creating processors that have a "fused architecture," meaning that they include CPUs and GPUs on a single chip. This approach decreases manufacturing costs and makes computers more energy efficient. However, the CPU cores and GPU cores still work almost exclusively on separate functions. They rarely collaborate to execute any given program, so they aren't as efficient as they could be,' explains Zhou. 'That's the issue we’re trying to resolve.'
While the research may have been helped along by AMD's input, it applies equally to Intel's latest-generation Sandy Bridge architecture. Where Intel seems happy to keep its current level of integration, however, AMD is forging ahead with a full fusion of GPU and CPU. Should the paper's experiments prove themselves in the real world, that could give AMD the boost it needs to finally compete at the high-end with Intel.
To take advantage of the model, extensions will need to be added to compilers that automatically generate a pre-execution program with memory access instructions for the GPU kernel. As a result, it's something of which only future software will be able to take advantage.
Zhou's paper is due to be presented at the International Symposium on High Performance Computer Architecture in New Orleans later this month.