AMD has released additional details regarding its upcoming Steamroller processor architecture, explaining the improvements it has made to the Piledriver design to boost performance-per-watt characteristics.
Unveiled as part of chief technology officer Mark Papermaster's presentation to Hot Chips attendees, changes made for the Steamroller design include a clever dynamic L2 caching system which can shrink to save power when running from battery and grow to boost performance when powered by the mains.
While that's apparently the biggest overall difference between Piledriver and Steamroller, there are plenty of other incremental improvements to be found. Many of these, including a claimed 30 per cent reduction in the layout area of the cores and a corresponding drop in power draw, come from a shift in design methodology at AMD: where previous Bulldozer cores were laid out by hand to maximise performance and density on the 32nm process, the company is now using a high-density cell library for layout - resulting in the same level of improvement normally associated with a drop in process size.
The next biggest change from Piledriver is Steamroller's ability to transfer data to the cores rapidly. AMD claims changes to the design have reduced branch prediction errors by 20 per cent and cache misses by 30 per cent, helping to minimise some of the inefficiencies of the Bulldozer architecture.
Not all changes result in improved performance, however: AMD has confirmed that, while the two 128-bit fused multiply-accumulate (FMAC) modules, which can combine into a single 256-bit module when required, remain present, the number of MMX units has been halved to one per core pair from Piledriver's two. The reason, AMD claims, is simply that the MMX instruction set extension is no longer as popular or efficient as it once was, and by ditching the second MMX unit major savings in layout space are possible without harming performance too badly.
For use in power-sensitive devices, Steamroller is to bring an extended power management system which takes full advantage of AMD's heterogeneous systems architecture (HSA) concept: as well as dynamically adjusting the clockspeed of the processor cores, the integral graphics processor can be controlled and even given the lion's share of power should the GPU be heavily loaded while the CPU is not. Combined with the size reductions, the loss of the second MMX unit and the dynamic L2 cache, this spells good things for Steamroller-era APUs.
For true competition to Intel and ARM in the tablet marketplace, however, the highlight of AMD's presence at Hot Chips is Jaguar. A quad-core low-power design, Jaguar features a large L2 cache shared between all four cores - rather than per two core unit, as with most of the company's designs. The result, AMD, claims, is a chip which can reach clock speeds ten per cent higher and execute 15 per cent more instructions per cycle than the current-generation Bobcat design.
Due to arrive next year as part of AMD's Kabini system-on-chip (SoC) design for notebooks and the sub-5W Temash SoC design for tablets, AMD has confirmed that it will be possible to disable selected cores to run the Jaguar as a dual- or even single-core chip for even lower power systems. As an answer to ARM, Jaguar could prove convincing indeed.
One thing not mentioned during Papermaster's speech but worthy of note is AMD's most recent hire: Jon Gustafson, now the chief product architect of the graphics division formerly known as ATI. Previously a senior architect of Intel's eXtreme Technologies Lab, Gustafson has made a name for himself in the field of parallel processing following the publication of the paper Reevaluating Amdahl's Law
- something AMD is keen to exploit.
'With the growing importance of parallel compute in defining the computing experience, John brings the full package of industry experience and knowledge needed to help us expand and execute our AMD Radeon and AMD FirePro graphics technology programs,
' claimed AMD's Matt Skynner of the hire, 'and will help forge an aggressive long-term roadmap that allows AMD to continue to lead and win with our gaming and virtualisation technologies.