bit-tech.net

Nehalem adds 7 more SSE4 instructions, SMP working

Nehalem adds 7 more SSE4 instructions, SMP working

Nehalem is up and running in an SMP environment just three weeks after tapeout.

During Patrick Gelsinger’s speech this afternoon, he shared a few more tidbits on Nehalem’s architecture, specifically giving some more details on QuickPath technology and the seven new SSE4 instructions that will be included, on top of the 47 new SSE4 instructions coming with Penryn.

“With Nehalem, we're delivering seven more new instructions that are part of it,” said Gelsinger. Application-specific instructions like POP counts and CRC-32, also new streaming instructions, specifically for workloads like XML. This is our first published data on that, where we see enormous reductions in the instruction count and over 3x performance improvement on XML-like workloads.”

One of the new optimisations will allow for 256 simultaneous compares in one instruction. Gelsinger said that this kind of improvement is what makes his heart beat and given the potential three fold performance increase, he’s understandably really excited by the potential of new instructions like this.

Otellini briefly mentioned the QuickPath architecture – this was something that Gelsinger expanded on. “This architecture provides [an] integrated memory controller as well as the point-to-point interconnect from the processors, robust RAS features, flexibility and scalability.”


In order to explain how QuickPath works, Gelsinger showed a block diagram of how Nehalem will look in dual- and quad-socket configurations. Each CPU can interact with one another without having to pass data through the north bridge and, because each CPU has an integrated memory controller, each CPU has its own memory banks.

“[QuickPath] will be used across the Xeon family, high-end desktops, as well as into our Itanium family of products,” explained Gelsinger. Of course, the design looks awfully similar to the way AMD’s architecture scales, but that’s not exactly a bad thing because AMD’s architecture scales incredibly well over multiple sockets whereas Intel’s architecture never really has.


Gelsinger brought Jim Brayton, the Nehalem project manager, onto the stage to and asked him the question that many of us want to know the answer to: just how fast is Nehalem? “I could tell you that, but I’m not sure I could tell [the audience] that, said Brayton. “But maybe between me and you, real fast.”

Brayton then demonstrated the Nehalem system up and running with A0 silicon and an A0 Talesberg chipset. “[This is] a Nehalem system, SMT, Quad-Core, two processors, connected together with QuickPath technology running server-intensive applications in the foreground here, some things in the background, all brand new, three weeks old, and we're all very happy,” he said.

Discuss in the forums

14 Comments

Discuss in the forums Reply
DarkLord7854 19th September 2007, 19:09 Quote
I want one :(
ComputerKing 19th September 2007, 19:53 Quote
16 CORES! What the! How come it invented! I Don't know why some one need 16 Cores, maybe to lunch some rockets ! I like it.
ssj12 19th September 2007, 20:07 Quote
its 8 cores and 16 threads not 16 cores, that would be 32 threads.
ComputerKing 19th September 2007, 20:47 Quote
Quote:
Originally Posted by ssj12
its 8 cores and 16 threads not 16 cores, that would be 32 threads.

Threads
Brooxy 19th September 2007, 22:42 Quote
Do we even have an OS that can utilise all 16 threads? If not, it sounds like intel are just trying to show off their e-peen...
completemadness 20th September 2007, 01:12 Quote
Quote:
Originally Posted by Brooxy
Do we even have an OS that can utilise all 16 threads? If not, it sounds like intel are just trying to show off their e-peen...
As far as i know, OS's don't have problems handing masses of CPU's and lots of threads

the problem is getting programs that can utilise it
DarkLord7854 20th September 2007, 02:10 Quote
Quote:
Originally Posted by completemadness
the problem is getting programs that can utilise it

You can manually assign programs to different cores/threads
completemadness 20th September 2007, 02:42 Quote
Quote:
Originally Posted by DarkLord7854
You can manually assign programs to different cores/threads
windows will automatically spread the load over cores so there's not a lot of point, all my background processes probably use 1% of a core, Games use 100% of a core

So in total i can use 101% of a 200% processor, or 101% of a 400% processor, or 101% of a 800% processor
Even sup comm will run fine on dual core, because that's all it can really utilise

So i don't see the point in >2 cores ATM, and quad core at the max, until there are a lot of good multi cored programs, adding all these extra cores is just a waste of money
Its a shame AMD never did get that fusion thing going, it would be awesome for the next ... well i don't know how long tbh, but certainly many years to come
Joeymac 20th September 2007, 07:29 Quote
There are many game engines that are being updated to utilise multicore systems. Next year ''threading your software'' will be the thing to be doing and by the time nehalem comes out then there will be plenty of stuff that makes use of all those cores/threads. Something as powerful as that 16 thread system is probably going to be a video editing or rendering application anyway, most of those will already ramp up to fill those threads.
I don't see why people moan about a game not maxing out all their cores. I doubt there is a game out at the moment that maxes out a single core on even the cheapest C2D system whilst you play the game at proper resolutions above 1024x768. The game that does go over to another core isn't going to be doing it for the sake of it... the other cores will be for parallel tasks like physics and audio. Otherwise, what's the point.
TheEclypse 20th September 2007, 13:34 Quote
Quote:
Originally Posted by completemadness
until there are a lot of good multi cored programs, adding all these extra cores is just a waste of money

Now they have the cores to play with, they will start playing with them. Bring back functional programming ;)
completemadness 20th September 2007, 14:06 Quote
Quote:
Originally Posted by TheEclypse
Now they have the cores to play with, they will start playing with them. Bring back functional programming ;)
we have had consumer multi-core for at least a year or 2, Servers have had SMP for as long as i can remember
And then there was hyper threading as well

Its taken a long time for any kind of multi-threading to come out of the pipework, and so far none of it exactly blows your socks off
TheEclypse 20th September 2007, 14:09 Quote
Quote:
Originally Posted by completemadness
Its taken a long time for any kind of multi-threading to come out of the pipework, and so far none of it exactly blows your socks off

I havent graduated from uni yet :p
Max Spain 23rd September 2007, 21:12 Quote
Quote:
One of the new optimisations will allow for 256 simultaneous compares in one instruction.
That SSE4 looks nifty. Also, with Quick Path, they should catch up with AMD on multi-proc performance. AMD really needs to start innovating like Intel is right now.
wuyanxu 24th September 2007, 10:03 Quote
quick question: would i need SSE4 if i were just to play games?

anyone think 4 cures are enough for any games for at least 3 to 4 years?
Log in

You are not logged in, please login with your forum account below. If you don't already have an account please register to start contributing.



Discuss in the forums