Intel has managed to get its quad-core processor to the market incredibly fast for a number of reasons. The main reason being that the Kentsfield ‘core’ uses some of the principles that Intel used on its 65-nanometre Pentium D desktop processors that were based on the Netburst architecture. In the same way that Presler was a pair of Cedar Mill chips on a single CPU package, Kentsfield is a pair of Conroe chips on the same package.
Taking that into account, the power envelope has just doubled after Intel reduced the power requirements on its Core 2 Duo processors. It’s technically a pair of dual-core chips in one socket, rather than a native quad-core chip. In particular, the Core 2 Extreme QX6700 is a pair of Core 2 Duo E6700 processors on the same CPU package, meaning that the thermal design power has gone from 65W to 130W as a result of adding another two cores.
The two Conroe dies communicate with each other via the north bridge and memory controller in just the same way that Presler did. However, this time Intel’s Core architecture isn’t quite so dependant on memory bandwidth in a dual-core configuration. While a single-core chip based on the Netburst architecture wasn’t heavily reliant on memory bandwidth, a dual-core Pentium D chip was because the two cores communicated via the front side bus.
Both single and dual-core Pentium’s also suffered from an overly long instruction pipeline. Having said that though, Intel’s first dual-core processors based on the Smithfield core weren’t particularly bad in isolation, but then once AMD had launched its dual-core Athlon 64 X2 CPU’s, Intel’s dual-core problems were under the spotlight.
Our mockup of a Kentsfield die – it’s a pair of Conroe dies on one CPU package
Intel tried to resolve a lot of these problems by increasing the cache size when it shrunk to 65-nanometres, but the inherent problems with the architecture could not be hidden under the guise of a larger cache. Especially when you consider that half of the CPU’s total cache may be sitting completely idle if only one of the two cores was executing instructions.
Thankfully with the introduction of Intel’s Core microarchitecture, virtually all of the problems with the Netburst microarchitecture were solved. In particular, the shorter, wider instruction pipeline helped to eradicate Netburst’s biggest shortfall – the long-winded instruction pipeline. Just like Conroe, Kentsfield is capable of executing four instructions per clock, per core. This means that it’s possible for Intel’s Core 2 Extreme QX6700 to execute up to sixteen instructions per clock if the application or applications can make use of the processor’s power.
Along with the shorter, wider instruction pipeline the QX6700 comes with all of the other features that make up Intel’s Core architecture. These include Intelligent Power Capabilities, Advanced Smart Cache, Smart Memory Access and Advanced Digital Media Boost. We covered these technologies in detail during our initial Core 2 processor review, so we won’t cover that ground again. In saying that though, we do want to cover how some of these technologies help with the dual-die implementation used on Kentsfield.
Here's what you'll find under the IHS if you successfully remove it - picture courtesy of Intel Corp.
When designing the Core architecture, Intel realised that one of its major hindrances was the latency between the memory controller and the L2 cache. With AMD going down the route of integrating the memory controller onto the CPU when it launched its K8 architecture, latency between the memory controller and L2 cache was virtually eradicated. This was because the two were literally right next to each other on the CPU die, meaning that there was no longer the need for the two to talk to each other via the front side bus.
In order to reduce its own latency, Intel went to work on optimising the way its L2 cache interacted with the memory controller. The result was that it souped up the L2 prefetch algorithms and added a feature called memory disambiguation. The addition of these two features meant that Conroe was much more intelligent when it came to collecting data from the memory controller. What this means for Kentsfield is that there is less traffic travelling across the front side bus because the two Conroe chips are making more reads from L2 cache than they are from the memory controller.
This obviously frees up bandwidth for the two Conroe chips to communicate with each other via the front side bus. The fact that Intel cleaned up its memory accesses is a reason why I believe that the company’s first quad-core processor isn’t going to be as bottlenecked as its first dual-core processor. Obviously, that depends on how AMD’s K8L architecture turns out when released – it’s looking promising at the moment, and I’m excited to see what is up AMD’s sleeves.