AMD's Steamroller design changes promise a similar improvement to a process node shrink but simply through improved layout techniques - and the dropping of an MMX unit.
AMD has released additional details regarding its upcoming Steamroller processor architecture, explaining the improvements it has made to the Piledriver design to boost performance-per-watt characteristics.
Unveiled as part of chief technology officer Mark Papermaster's presentation to Hot Chips attendees, changes made for the Steamroller design include a clever dynamic L2 caching system which can shrink to save power when running from battery and grow to boost performance when powered by the mains.
While that's apparently the biggest overall difference between Piledriver and Steamroller, there are plenty of other incremental improvements to be found. Many of these, including a claimed 30 per cent reduction in the layout area of the cores and a corresponding drop in power draw, come from a shift in design methodology at AMD: where previous Bulldozer cores were laid out by hand to maximise performance and density on the 32nm process, the company is now using a high-density cell library for layout - resulting in the same level of improvement normally associated with a drop in process size.
The next biggest change from Piledriver is Steamroller's ability to transfer data to the cores rapidly. AMD claims changes to the design have reduced branch prediction errors by 20 per cent and cache misses by 30 per cent, helping to minimise some of the inefficiencies of the Bulldozer architecture.
Not all changes result in improved performance, however: AMD has confirmed that, while the two 128-bit fused multiply-accumulate (FMAC) modules, which can combine into a single 256-bit module when required, remain present, the number of MMX units has been halved to one per core pair from Piledriver's two. The reason, AMD claims, is simply that the MMX instruction set extension is no longer as popular or efficient as it once was, and by ditching the second MMX unit major savings in layout space are possible without harming performance too badly.
For use in power-sensitive devices, Steamroller is to bring an extended power management system which takes full advantage of AMD's heterogeneous systems architecture (HSA) concept: as well as dynamically adjusting the clockspeed of the processor cores, the integral graphics processor can be controlled and even given the lion's share of power should the GPU be heavily loaded while the CPU is not. Combined with the size reductions, the loss of the second MMX unit and the dynamic L2 cache, this spells good things for Steamroller-era APUs.
For true competition to Intel and ARM in the tablet marketplace, however, the highlight of AMD's presence at Hot Chips is Jaguar. A quad-core low-power design, Jaguar features a large L2 cache shared between all four cores - rather than per two core unit, as with most of the company's designs. The result, AMD, claims, is a chip which can reach clock speeds ten per cent higher and execute 15 per cent more instructions per cycle than the current-generation Bobcat design.
Due to arrive next year as part of AMD's Kabini system-on-chip (SoC) design for notebooks and the sub-5W Temash SoC design for tablets, AMD has confirmed that it will be possible to disable selected cores to run the Jaguar as a dual- or even single-core chip for even lower power systems. As an answer to ARM, Jaguar could prove convincing indeed.
One thing not mentioned during Papermaster's speech but worthy of note is AMD's most recent hire: Jon Gustafson, now the chief product architect of the graphics division formerly known as ATI. Previously a senior architect of Intel's eXtreme Technologies Lab, Gustafson has made a name for himself in the field of parallel processing following the publication of the paper
Reevaluating Amdahl's Law - something AMD is keen to exploit.
'
With the growing importance of parallel compute in defining the computing experience, John brings the full package of industry experience and knowledge needed to help us expand and execute our AMD Radeon and AMD FirePro graphics technology programs,' claimed AMD's Matt Skynner of the hire, '
and will help forge an aggressive long-term roadmap that allows AMD to continue to lead and win with our gaming and virtualisation technologies.'
20 Comments
Discuss in the forums ReplyAnd are we ever going to see AMD being a realistic alternative to a high spec Intel?
Now when Intel really gets rolling with their IGP technology AMD might end up in the back the line again.
Then again, they did kick Intel where it hurt with the Athlon64 and that woke the sleeping giant...
...
That said, if Bulldozer wasn't so power-hungry, at the prices they currently are I'd probably buy one just to have a play with. Bit like the FM1 A8 chips; very interested in them...
For your first, no idea, but compared to Ivy, it might actually be pretty comeptive. Compared to what Haswell seems to be promising...not even close. Haswell supposedly is bring transactional instruction sets which might well promise HUGE gains in multithreaded efficiency as well as a bunch of other fun stuff. Combine that with a supposed increase in GPU ability of 2.5x...and AMD might be significantly lagging in both CPU AND GPU ability in their APUs (and CPU onlys) pretty soon.
For your later question...not likely, but it could happen eventually if Intel falls asleep or AMD stumbles upon some radical new innovation.
I'm pretty sure there was an article about a month ago saying they're not focusing on high-end anymore, unless I was mistaken and Piledriver is the last.
I personally wonder if these recent changes were made by the Athlon 64 guy who they recently re-hired. If so, maybe they have further changes planned before the release date and then there can be even more performance improvements. Just high hopes though.
http://blogs.amd.com/developer/2009/11/17/the-velox-research-project/
That article was nearly 2 years ago.
TBH,until the CPUs which have them hit retail and there is adequate software support,I am going to reserve judgement. It took years for the 64 bit extensions in the Athlon 64 to be properly supported by software.
Transactional memory will need software support. Without it you'll see no benefits. On the desktop this effectively means you won't see any uses for it for years after Haswell is released. Not to mention gaming, that'll take years and years and years to use transactional memory.
There's no way he would have been involved with the changes, these things take 12 months+ to push through and then add on validation and prototyping on top of that, so that's another 12 months. Latest I heard was Piledriver had completed the 'design phase' so working out how to turn a snazzy diagram into a physical product and validating and testing the whole ensemble is what comes between now and the 2013 launch date.
Some stuff relating to fabric interconnects for servers - irrelevant.
Dedicated decoders - not directly mentioned in the article, but part of the reason for the claimed 30 per cent improvement.
The "floating point rebalance" - nobody appears to understand that, which is why most people (myself included) have left it out. To quote Techreport: "We're unsure what the floating-point 'rebalance' is all about."
Eermm... The fact that the high-density cell library might not actually be used until the next process shrink? As far as I know, AMD hasn't actually made an official decision on that yet.
But hey, thanks for the feedback.
"reduced branch prediction errors by 20 per cent and cache misses by 30 per cent" no mention of performance. You seem to have difficulty reading your own copy, never mind anyone else's .
But hey!; thanks for selective out-of-context editing! Let's hope no one else reads decent coverage of this topic - they'll never know the difference!
Thought not, somehow. It's always easier to destroy than create, isn't it?
Good news, though: I'm going on holiday for a week, so that's at least ten fewer articles likely to make you sit up and think "I'm going to be a dick to someone I've never met, using the power of anonymity and the internet! Mother, fetch my cape!"
Still; I'd quite like this, Bulldozer isn't that slow, it's just not that fast in relation to Intel.
Oh dear. Reason for me not writing an article: [1] not being paid [2] diligent work in analysis will be insulted by ignoramic fanboys [oh the irony!] [3] and refer you to [1] , not even in alcohol.
PS. Can't really remember previously insulting you/treating you like dick. Still, I treat so many people like dicks I might simply have forgotten. {Or is it that I get insulted for daring to contradict an inaccurate received
wisdom?} Anyway, Mr. Halfacree on holiday! Lucky you, do have fun!
PPS. That Vr-Zone article is worth reading, everyone.