bit-tech.net

AMD details Steamroller design changes

AMD details Steamroller design changes

AMD's Steamroller design changes promise a similar improvement to a process node shrink but simply through improved layout techniques - and the dropping of an MMX unit.

AMD has released additional details regarding its upcoming Steamroller processor architecture, explaining the improvements it has made to the Piledriver design to boost performance-per-watt characteristics.

Unveiled as part of chief technology officer Mark Papermaster's presentation to Hot Chips attendees, changes made for the Steamroller design include a clever dynamic L2 caching system which can shrink to save power when running from battery and grow to boost performance when powered by the mains.

While that's apparently the biggest overall difference between Piledriver and Steamroller, there are plenty of other incremental improvements to be found. Many of these, including a claimed 30 per cent reduction in the layout area of the cores and a corresponding drop in power draw, come from a shift in design methodology at AMD: where previous Bulldozer cores were laid out by hand to maximise performance and density on the 32nm process, the company is now using a high-density cell library for layout - resulting in the same level of improvement normally associated with a drop in process size.

The next biggest change from Piledriver is Steamroller's ability to transfer data to the cores rapidly. AMD claims changes to the design have reduced branch prediction errors by 20 per cent and cache misses by 30 per cent, helping to minimise some of the inefficiencies of the Bulldozer architecture.

Not all changes result in improved performance, however: AMD has confirmed that, while the two 128-bit fused multiply-accumulate (FMAC) modules, which can combine into a single 256-bit module when required, remain present, the number of MMX units has been halved to one per core pair from Piledriver's two. The reason, AMD claims, is simply that the MMX instruction set extension is no longer as popular or efficient as it once was, and by ditching the second MMX unit major savings in layout space are possible without harming performance too badly.

For use in power-sensitive devices, Steamroller is to bring an extended power management system which takes full advantage of AMD's heterogeneous systems architecture (HSA) concept: as well as dynamically adjusting the clockspeed of the processor cores, the integral graphics processor can be controlled and even given the lion's share of power should the GPU be heavily loaded while the CPU is not. Combined with the size reductions, the loss of the second MMX unit and the dynamic L2 cache, this spells good things for Steamroller-era APUs.

For true competition to Intel and ARM in the tablet marketplace, however, the highlight of AMD's presence at Hot Chips is Jaguar. A quad-core low-power design, Jaguar features a large L2 cache shared between all four cores - rather than per two core unit, as with most of the company's designs. The result, AMD, claims, is a chip which can reach clock speeds ten per cent higher and execute 15 per cent more instructions per cycle than the current-generation Bobcat design.

Due to arrive next year as part of AMD's Kabini system-on-chip (SoC) design for notebooks and the sub-5W Temash SoC design for tablets, AMD has confirmed that it will be possible to disable selected cores to run the Jaguar as a dual- or even single-core chip for even lower power systems. As an answer to ARM, Jaguar could prove convincing indeed.

One thing not mentioned during Papermaster's speech but worthy of note is AMD's most recent hire: Jon Gustafson, now the chief product architect of the graphics division formerly known as ATI. Previously a senior architect of Intel's eXtreme Technologies Lab, Gustafson has made a name for himself in the field of parallel processing following the publication of the paper Reevaluating Amdahl's Law - something AMD is keen to exploit.

'With the growing importance of parallel compute in defining the computing experience, John brings the full package of industry experience and knowledge needed to help us expand and execute our AMD Radeon and AMD FirePro graphics technology programs,' claimed AMD's Matt Skynner of the hire, 'and will help forge an aggressive long-term roadmap that allows AMD to continue to lead and win with our gaming and virtualisation technologies.'

20 Comments

Discuss in the forums Reply
Guinevere 29th August 2012, 11:01 Quote
So it's an incremental improvement... so how far behind Intel will this place AMD when it finally hits the market?

And are we ever going to see AMD being a realistic alternative to a high spec Intel?
whatsthatnoise 29th August 2012, 11:39 Quote
Maybe they'll at least create a value alternative to Intel.
Neogumbercules 29th August 2012, 13:01 Quote
I love my quad-core FM1 chip. It allowed me to build a cheap, low power, decent HTPC with the capability to play Skyrim at decent settings at 720p. No graphics card required. No Intel chip could have offered me that for the price I got my 3670k for ($80). The thing is perfect fast enough for everything I use it for as well. (though I bet the SSD has something to do with that :P)

Now when Intel really gets rolling with their IGP technology AMD might end up in the back the line again.
Paradigm Shifter 29th August 2012, 13:06 Quote
I'm interested in Steamroller, but I do think AMD need to be asking some long hard questions about how their designers are working: why is it that they seem to need two goes at everything to get it 'right' recently?

Then again, they did kick Intel where it hurt with the Athlon64 and that woke the sleeping giant...

...

That said, if Bulldozer wasn't so power-hungry, at the prices they currently are I'd probably buy one just to have a play with. Bit like the FM1 A8 chips; very interested in them...
azazel1024 29th August 2012, 14:24 Quote
Quote:
Originally Posted by Guinevere
So it's an incremental improvement... so how far behind Intel will this place AMD when it finally hits the market?

And are we ever going to see AMD being a realistic alternative to a high spec Intel?

For your first, no idea, but compared to Ivy, it might actually be pretty comeptive. Compared to what Haswell seems to be promising...not even close. Haswell supposedly is bring transactional instruction sets which might well promise HUGE gains in multithreaded efficiency as well as a bunch of other fun stuff. Combine that with a supposed increase in GPU ability of 2.5x...and AMD might be significantly lagging in both CPU AND GPU ability in their APUs (and CPU onlys) pretty soon.

For your later question...not likely, but it could happen eventually if Intel falls asleep or AMD stumbles upon some radical new innovation.
schmidtbag 29th August 2012, 14:25 Quote
Quote:
Originally Posted by Guinevere
So it's an incremental improvement... so how far behind Intel will this place AMD when it finally hits the market?

And are we ever going to see AMD being a realistic alternative to a high spec Intel?

I'm pretty sure there was an article about a month ago saying they're not focusing on high-end anymore, unless I was mistaken and Piledriver is the last.


I personally wonder if these recent changes were made by the Athlon 64 guy who they recently re-hired. If so, maybe they have further changes planned before the release date and then there can be even more performance improvements. Just high hopes though.
CAT-THE-FIFTH 29th August 2012, 15:10 Quote
It seems AMD is also looking at transactional memory too:

http://blogs.amd.com/developer/2009/11/17/the-velox-research-project/

That article was nearly 2 years ago.

TBH,until the CPUs which have them hit retail and there is adequate software support,I am going to reserve judgement. It took years for the 64 bit extensions in the Athlon 64 to be properly supported by software.
xxxsonic1971 30th August 2012, 01:03 Quote
I really hope this is a success for AMD, Intel are too dominant atm, not good for us buyers!!
dicobalt 30th August 2012, 02:04 Quote
All I get from this is that it is going to have yet more cores than Piledriver and minor other improvements. Still no decent IPC increases to be seen. Maybe if they can get the power draw down low enough this thing will have a market but I don't know, competition is pretty tough in that area.
jrs77 30th August 2012, 04:48 Quote
Where AMD falls short is rendering, video-encoding and productivity-stuff like that and the reason why I'm using intel CPUs for the last few years. If you're not using your PC for productivity-stuff like that tho, then AMD has a better value due to it's lower pricetag.
Action_Parsnip 30th August 2012, 12:49 Quote
Quote:
Originally Posted by azazel1024
Quote:
Originally Posted by Guinevere
So it's an incremental improvement... so how far behind Intel will this place AMD when it finally hits the market?

And are we ever going to see AMD being a realistic alternative to a high spec Intel?

For your first, no idea, but compared to Ivy, it might actually be pretty comeptive. Compared to what Haswell seems to be promising...not even close. Haswell supposedly is bring transactional instruction sets which might well promise HUGE gains in multithreaded efficiency as well as a bunch of other fun stuff. Combine that with a supposed increase in GPU ability of 2.5x...and AMD might be significantly lagging in both CPU AND GPU ability in their APUs (and CPU onlys) pretty soon.

For your later question...not likely, but it could happen eventually if Intel falls asleep or AMD stumbles upon some radical new innovation.

Transactional memory will need software support. Without it you'll see no benefits. On the desktop this effectively means you won't see any uses for it for years after Haswell is released. Not to mention gaming, that'll take years and years and years to use transactional memory.
Quote:
Originally Posted by schmidtbag
Quote:
Originally Posted by Guinevere
So it's an incremental improvement... so how far behind Intel will this place AMD when it finally hits the market?

And are we ever going to see AMD being a realistic alternative to a high spec Intel?

I'm pretty sure there was an article about a month ago saying they're not focusing on high-end anymore, unless I was mistaken and Piledriver is the last.


I personally wonder if these recent changes were made by the Athlon 64 guy who they recently re-hired. If so, maybe they have further changes planned before the release date and then there can be even more performance improvements. Just high hopes though.

There's no way he would have been involved with the changes, these things take 12 months+ to push through and then add on validation and prototyping on top of that, so that's another 12 months. Latest I heard was Piledriver had completed the 'design phase' so working out how to turn a snazzy diagram into a physical product and validating and testing the whole ensemble is what comes between now and the 2013 launch date.
pearl.of.wisdom 30th August 2012, 17:33 Quote
Good Lord. Please whoever posted this newsnip; actually try and understand the briefing. Power gating secondary cache the biggest change?!! WTF! Everyone, I refer you to Anandtech or Techreport for actual infomation. Dear dear.
Gareth Halfacree 30th August 2012, 18:33 Quote
Quote:
Originally Posted by pearl.of.wisdom
Good Lord. Please whoever posted this newsnip; actually try and understand the briefing. Power gating secondary cache the biggest change?!! WTF! Everyone, I refer you to Anandtech or Techreport for actual infomation. Dear dear.
What a useful comment. Care to actually elucidate as to what was missed? Here, let me help:
Some stuff relating to fabric interconnects for servers - irrelevant.
Dedicated decoders - not directly mentioned in the article, but part of the reason for the claimed 30 per cent improvement.
The "floating point rebalance" - nobody appears to understand that, which is why most people (myself included) have left it out. To quote Techreport: "We're unsure what the floating-point 'rebalance' is all about."
Eermm... The fact that the high-density cell library might not actually be used until the next process shrink? As far as I know, AMD hasn't actually made an official decision on that yet.

But hey, thanks for the feedback.
ch424 30th August 2012, 23:47 Quote
I read somewhere that the "floating point rebalance" was to re-jig the FPU a bit to let MMX and SSE instructions use the same multipliers rather than sending the data along slightly separate paths for different instructions. I think this mainly saves area rather than performance or power though.
pearl.of.wisdom 31st August 2012, 17:40 Quote
Dear Mr Halfacree, Are you aware that your newsnip did not mention a 30% improvement in performance?
"reduced branch prediction errors by 20 per cent and cache misses by 30 per cent" no mention of performance. You seem to have difficulty reading your own copy, never mind anyone else's .

But hey!; thanks for selective out-of-context editing! Let's hope no one else reads decent coverage of this topic - they'll never know the difference!
Gareth Halfacree 31st August 2012, 17:51 Quote
Quote:
Originally Posted by pearl.of.wisdom
Let's hope no one else reads decent coverage of this topic - they'll never know the difference!
I look forward to reading your analysis of the topic, Mr. Wisdom. I'm sure it will be most illuminating.
pearl.of.wisdom 31st August 2012, 18:01 Quote
Ha! nah. You might check vr-zone though, that's quite intreasting. 45%? ipc?
Gareth Halfacree 31st August 2012, 18:10 Quote
Quote:
Originally Posted by pearl.of.wisdom
Ha! nah. You might check vr-zone though, that's quite intreasting. 45%? ipc?

Thought not, somehow. It's always easier to destroy than create, isn't it?

Good news, though: I'm going on holiday for a week, so that's at least ten fewer articles likely to make you sit up and think "I'm going to be a dick to someone I've never met, using the power of anonymity and the internet! Mother, fetch my cape!"
.//TuNdRa 31st August 2012, 20:46 Quote
Hooray. Yet another AMD fanboy eager to show their red cape off.

Still; I'd quite like this, Bulldozer isn't that slow, it's just not that fast in relation to Intel.
pearl.of.wisdom 1st September 2012, 16:31 Quote
Quote:
Originally Posted by Gareth Halfacree
Quote:
Originally Posted by pearl.of.wisdom
Ha! nah. You might check vr-zone though, that's quite intreasting. 45%? ipc?

Thought not, somehow. It's always easier to destroy than create, isn't it?

Good news, though: I'm going on holiday for a week, so that's at least ten fewer articles likely to make you sit up and think "I'm going to be a dick to someone I've never met, using the power of anonymity and the internet! Mother, fetch my cape!"

Oh dear. Reason for me not writing an article: [1] not being paid [2] diligent work in analysis will be insulted by ignoramic fanboys [oh the irony!] [3] and refer you to [1] , not even in alcohol.

PS. Can't really remember previously insulting you/treating you like dick. Still, I treat so many people like dicks I might simply have forgotten. {Or is it that I get insulted for daring to contradict an inaccurate received
wisdom?} Anyway, Mr. Halfacree on holiday! Lucky you, do have fun!
PPS. That Vr-Zone article is worth reading, everyone.
Log in

You are not logged in, please login with your forum account below. If you don't already have an account please register to start contributing.



Discuss in the forums