Nehalem adds 7 more SSE4 instructions, SMP working

Written by Tim Smalley

September 19, 2007 // 5:54 p.m.

Tags: #2007 #controller #demo #fall #gelsinger #idf #instructions #integrated #memory #nehalem #quickpath #smp #sse4

During Patrick Gelsinger’s speech this afternoon, he shared a few more tidbits on Nehalem’s architecture, specifically giving some more details on QuickPath technology and the seven new SSE4 instructions that will be included, on top of the 47 new SSE4 instructions coming with Penryn.

“With Nehalem, we're delivering seven more new instructions that are part of it,” said Gelsinger. Application-specific instructions like POP counts and CRC-32, also new streaming instructions, specifically for workloads like XML. This is our first published data on that, where we see enormous reductions in the instruction count and over 3x performance improvement on XML-like workloads.”

One of the new optimisations will allow for 256 simultaneous compares in one instruction. Gelsinger said that this kind of improvement is what makes his heart beat and given the potential three fold performance increase, he’s understandably really excited by the potential of new instructions like this.

Otellini briefly mentioned the QuickPath architecture – this was something that Gelsinger expanded on. “This architecture provides [an] integrated memory controller as well as the point-to-point interconnect from the processors, robust RAS features, flexibility and scalability.”


In order to explain how QuickPath works, Gelsinger showed a block diagram of how Nehalem will look in dual- and quad-socket configurations. Each CPU can interact with one another without having to pass data through the north bridge and, because each CPU has an integrated memory controller, each CPU has its own memory banks.

“[QuickPath] will be used across the Xeon family, high-end desktops, as well as into our Itanium family of products,” explained Gelsinger. Of course, the design looks awfully similar to the way AMD’s architecture scales, but that’s not exactly a bad thing because AMD’s architecture scales incredibly well over multiple sockets whereas Intel’s architecture never really has.


Gelsinger brought Jim Brayton, the Nehalem project manager, onto the stage to and asked him the question that many of us want to know the answer to: just how fast is Nehalem? “I could tell you that, but I’m not sure I could tell [the audience] that, said Brayton. “But maybe between me and you, real fast.”

Brayton then demonstrated the Nehalem system up and running with A0 silicon and an A0 Talesberg chipset. “[This is] a Nehalem system, SMT, Quad-Core, two processors, connected together with QuickPath technology running server-intensive applications in the foreground here, some things in the background, all brand new, three weeks old, and we're all very happy,” he said.

Discuss in the forums
Discuss this in the forums

QUICK COMMENT

SUBSCRIBE TO OUR NEWSLETTER

WEEK IN REVIEW

TOP STORIES

SUGGESTED FOR YOU