bit-tech.net

AMD unveils Flex FP

AMD unveils Flex FP

AMD has explained some of the advantages that the company's new Flex FP hardware will bring.

AMD has unveiled some new details about its Flex FP floating-point unit, due to d├ębut with the Bulldozer processor line, due for servers and workstations next year.

In a blog post entitled The New Flex FP, AMD's director of product marketing John Fruehe explains that the new hardware 'delivers tremendous floating-point capabilities for technical and financial applications,' offering much improved performance and flexibility over the company's older floating point units.

Fruehe states that a single Flex FP will be shared between every two cores, so the company's Interlagos 16-core processors will include eight Flex FP units, which are capable of executing 256-bit floating-point commands through the AVX instruction set extension. Remember that Bulldozer uses a dual-thread design similar to Intel's Hyper-Threading - see the above Bulldozer link for more on this.

Although Fruehe admits that there is 'no such thing as a 256-bit command,' Fruehe explains that AVX-compatible code will be able to 'execute eight 32-bit commands or four 64-bit commands per cycle,' double that of traditional 128-bit floating-point units.

Code doesn't have to be rewritten to support AVX in order to take advantage of the Flex FP, however. The Flex FP is comprised of two 128-bit FMAC units capable of performing FMAC, FADD, or FMUL instructions per cycle, either combined as a 256-bit instruction or split as two 128-bit instructions, which apparently provides significantly higher performance than 'competing solutions.'

Despite the improvements offered by the Flex FP hardware to unoptimised code, Fruehe admits that 'there are benefits of recompiled code that will support the new AVX instructions,' but he appears confident that despite not expecting to see 'rapid movement to AVX until well after the platforms are available on the streets' the new Flex FP will provide significantly improved performance and flexibility from day one.

Do you think that Bulldozer will be the server-room winner that AMD expects, or is it going to take more than a more flexible floating-point unit to out-muscle Nehalem-based Xeons? Share your thoughts over in the forums.

14 Comments

Discuss in the forums Reply
javaman 28th October 2010, 18:06 Quote
Cool, Next time unveil it with benchmarks and can it run Metro 2033?
Snips 28th October 2010, 20:18 Quote
I really can't get excited when it's just based on comments by the AMD marketing department.

Give us independently verified benchtests and then I'll get interested.
HourBeforeDawn 28th October 2010, 20:44 Quote
seems promising :)
TomH 28th October 2010, 21:19 Quote
Why am I reminded of 3D Now!? Given that it's going up against SSE, which already has a stronghold in the 'x86 extension support' arena, I can't see AVX adoption being a huge success without absolutely destroying the equivalent Intel setup (required to get enough developer/OEM attention), pound for pound.

Of course, AMD also brought about AMD64 (x86_64 these days) which was certainly prevalent over Intel, but that has a lot to do with getting their first.

It goes without saying that this is a risky move from AMD.
Vigilante 28th October 2010, 21:40 Quote
Quote:
Originally Posted by TomH
Why am I reminded of 3D Now!? Given that it's going up against SSE, which already has a stronghold in the 'x86 extension support' arena, I can't see AVX adoption being a huge success without absolutely destroying the equivalent Intel setup (required to get enough developer/OEM attention), pound for pound.

Of course, AMD also brought about AMD64 (x86_64 these days) which was certainly prevalent over Intel, but that has a lot to do with getting their first.

It goes without saying that this is a risky move from AMD.

AVX compliments SSE, it does not compete with it. Sandy bridge will support AVX too, all this announcement says is that they are betting AVX will be a big hit - as well it might since both AMD and Intel are going to support it.

However, their insistence on optimising so heavily on AVX instead of helping to keep general purpose code IPC up makes me think that bulldozer performance may not be as they expected... and are hoping that optimising to make AVX code run faster will help keep bulldozer afloat for server adoption. I certainly hope not, however, but that's just the impression I'm getting.
Krayzie_B.o.n.e. 29th October 2010, 03:06 Quote
Soooo.... Where are the Metro 2033 maxed out DOF on Tessellation very high 8X AA 16 AF benchamarks for this new chip? All that business application stuff is OK but if you can run Crysis while at work then DAM sign me up.
Action_Parsnip 29th October 2010, 13:55 Quote
Quote:
Originally Posted by TomH
Why am I reminded of 3D Now!? Given that it's going up against SSE, which already has a stronghold in the 'x86 extension support' arena, I can't see AVX adoption being a huge success without absolutely destroying the equivalent Intel setup (required to get enough developer/OEM attention), pound for pound.

Of course, AMD also brought about AMD64 (x86_64 these days) which was certainly prevalent over Intel, but that has a lot to do with getting their first.

It goes without saying that this is a risky move from AMD.

Dear oh dear, TomH. I mean really.
Action_Parsnip 29th October 2010, 13:58 Quote
You do understand where theyre saying AVX is going dont you vigilante? Its going to be the big thing, but not for a while, quiet a while, so theyve integrated support for it, SUPPORT, but its not optimised for it.
schmidtbag 29th October 2010, 15:58 Quote
Quote:
Originally Posted by Action_Parsnip
You do understand where theyre saying AVX is going dont you vigilante? Its going to be the big thing, but not for a while, quiet a while, so theyve integrated support for it, SUPPORT, but its not optimised for it.

considering how avx isn't exactly a requirement for computing (seeing as how its never existed before) i think its smart for amd to not waste so much time, money, and resources on something that won't achieve it's full potential for maybe a year or so, in which by then they would have a new line of processors that would have the optimizations.
they said avx will work with older stuff not using the instruction set, but you still need to program and compile something to take full advantage of it, and thats where special things like CUDA don't appeal to most server software developers right now (not just cuda but any architecture change). cuda is a brilliant idea but it doesn't appeal when it hasn't been around long enough to have a decent software collection or to be proven stable.

i'm sure amd will add more flex FPs when they see a rise in demand.
TomH 30th October 2010, 11:45 Quote
Quote:
Originally Posted by Action_Parsnip
Dear oh dear, TomH. I mean really.
No really, you wasted an entire post on that? You could at least elaborate. Or was this intentionally an ambiguous quip to support the presumption that you're OHSOMUCHSMARTER?
Quote:
Originally Posted by Vigilante
AVX compliments SSE, it does not compete with it. Sandy bridge will support AVX too, all this announcement says is that they are betting AVX will be a big hit - as well it might since both AMD and Intel are going to support it.

However, their insistence on optimising so heavily on AVX instead of helping to keep general purpose code IPC up makes me think that bulldozer performance may not be as they expected... and are hoping that optimising to make AVX code run faster will help keep bulldozer afloat for server adoption. I certainly hope not, however, but that's just the impression I'm getting.
I wasn't aware that AVX was a common extension to x86, as opposed to an AMD-specific optimisation until now. The article made no mention of the origins of AVX, I simply assumed a proprietary extension.Though I stand corrected, so thank you. ;)
Nexxo 30th October 2010, 11:50 Quote
But will it run Crysis?

...I'll get my coat.
drunkenmaster 30th October 2010, 13:30 Quote
Bulldozer uses a dual thread design similar to Interl Hyper-threading? Seriously

Hyperthreading one core, two threads, no hyperthreading, one core one thread. Bulldozer itself has TWO interger cores and can run a thread on each, its standard, hyperthreading or something similar would be those two cores running 4 threads, they aren't even close to similar.

Its possible the article writer was specifically talking about the FP unit, though they stated Bulldozer, not FPU, in which case its still basically incorrect.

Theres at the moment two methods, sticking more threads through the same core(HT) or adding more cores, they could not be more DISsimilar. Bulldozer and HT don't share any similarities AT ALL.

Even if just discussing the FPU, either one core can put a full 256bit instruction through it, or each core can put up to one 128bit instruction through it, however the functionality changes, when one core uses both parts of the FPU for a full 256bit instruction it can do VERY different things to a single 128bit instruction.

This isn't the same as with hyperthreading either, one core split into two threads, or one core with on thread, theres no change in functionality or ability of the code going through.

As for AVX, its going to be HUGE and should be VERY quickly adopted, and as for server loads, AMD have ALWAYS had strong FPU compared to Intel, if not bettering it, while Interger its always lagged behind, this has normally been the killer in server workloads/encoding and the like, its the massive bump to Interger performance(adding a second core to each module only costs 5% die space increase) by essentially fitting in two interger units into almost no extra space. So what formally would be a quad core will now become, in terms of size, a octo core, they are essentially doubling(infact more than that) Interger performance for a 5% die size penalty.

As most of you should know, or do know, price of chips comes down to die size, the more chips you can get off a single wafer, which has a fixed cost, the cheaper each individual chip is.

So you're looking at double the Interger performance simply with 2x's the interger cores, for a 5% die size increase at very little end cost. The fact that the cores themselves are also more efficient than previous cores, the core logic around it should be increased in efficiency AND the FPU and other things have had massive massive updates, its going to be a bit of a beast.
Action_Parsnip 30th October 2010, 17:58 Quote
Quote:
Originally Posted by TomH
No really, you wasted an entire post on that? You could at least elaborate. Or was this intentionally an ambiguous quip to support the presumption that you're OHSOMUCHSMARTER?

Well look, there is alot of reading material out there about bulldozer. Try dresdenboy's blog for starters, or the write-up on realworldtech.com. Your getting vector instructions and the nature of the FPU mixed up. The bulldozer FPU is shared between two execution units, hence the 'Flex' moniker. It can execute AVX instructions although it needs to break them into chunks and process them one after the other whereas Sandy Bridge will do them the whole 256 bits at a time. The argument goes that AVX will be big at some point, but the likelyhood of that being anytime soon after AVX enabled hardware arrives is small. Therefore full-blown acceleration is wasteful in the near-term at least. Wastefulnes in this context means die space going under-utilised in the majority of software cases.

This thinking is actually quiet conservative (i.e. sound) if you consider the precedents of newer SSE instructions in the recent past, or the introduction of SSE in the very beginning. They took quiet some time to graduate from novelties/toys/curiosities to practical, and finally to essential. MMX was - I believe - an instruction related to integer rather than floating point execution and so is not comparable here.

If you trawl through the various articles you'll find that the concept of bulldozer is doing more with less. If features or configurations cannot absolutely justify themselves, than they are cut. Die space in the active zones (the modules, not the cache) and power consumption are paramount. The chips will still end up being large, but this will be because of large amounts of cache memory (rumoured). The latest bandwagon says 2mb L2 per module, and 8mb L3. L1 will probably be something in the order of 16kb.
Holt 31st October 2010, 05:30 Quote
Bulldozer has great ideas realized in it but let me see benchmarks first.
Log in

You are not logged in, please login with your forum account below. If you don't already have an account please register to start contributing.



Discuss in the forums