sounds like they got some killer performance enhancements and some really great power management stuff coming up in the next batch. I can't wait to buy a new computer with one of these beasts. So do these processors classify as 64-bit like the Athlon 64's or are they now 128 bit, meaning even more performance than the athlons?
Just one bit that isn't quite right as far as I know:
Quote:
For example, if you have a 31-stage pipeline, as you do in the Pentium 4 Prescot, you can do 31 operations on an instruction in one clock cycle. That's a lot of operations. The advantage of a long pipeline is that it reduces the amount of times when an instruction comes out the end of the pipeline without being completed. When this happens, it has to loop around and go through again to be finished. Long instructions matched with long pipelines makes for efficiency.
A longer pipeline actually does less work on an instruction per clock cycle than a shorter one. In a long pipeline, such as the ones advocated in NetBurst's Hyper Pipelining Technology, the total switching time for the gates on each stage is much shorter, and so the CPU does less work per cycle but can hit much higher clock speeds. In the FPU for example, you can send a FMUL instruction and run the whole thing in one unit at once, or alternatively, you can have several per-clock parts to the unit (FADD, FLD) and do it a piece at a time.
The disadvantage of a long pipeline - specifically, one as large as Prescott's - is that it hugely increases penalties for various inherent problems with the processor. Branch mispredicts, although rare, mean that the CPU has to actually flush the entire pipeline of instructions and load the correct branch again - the minimum P4 mispredict penalty is over 20 clock cycles for data in the L1 cache. Pipeline bubbles, where the processor can't schedule two instructions to be executing simultaneously (or close to one another), result in a further loss because the bubble (i.e. an empty pipeline stage) propagates all the way down the chain. I imagine that the huge amount of work that the Willamette team did with the scheduling unit to stop this happening has moved right into Conroe.
As far as I know, instructions can't go down the pipeline without being completed - however, the above flush means that some must be removed before they can finish and others put in their place.
I hope this doesn't sound like I'm criticising the article, because it does a very good job of explaining the architecture :)
Originally Posted by thecrownles sounds like they got some killer performance enhancements and some really great power management stuff coming up in the next batch. I can't wait to buy a new computer with one of these beasts. So do these processors classify as 64-bit like the Athlon 64's or are they now 128 bit, meaning even more performance than the athlons?
No they are still 32/64bit, but can work with SSE(1/2/3) specific commands which are 128bits wide in total.
Just one bit that isn't quite right as far as I know:
A longer pipeline actually does less work on an instruction per clock cycle than a shorter one. In a long pipeline, such as the ones advocated in NetBurst's Hyper Pipelining Technology, the total switching time for the gates on each stage is much shorter, and so the CPU does less work per cycle but can hit much higher clock speeds. In the FPU for example, you can send a FMUL instruction and run the whole thing in one unit at once, or alternatively, you can have several per-clock parts to the unit (FADD, FLD) and do it a piece at a time.
The disadvantage of a long pipeline - specifically, one as large as Prescott's - is that it hugely increases penalties for various inherent problems with the processor. Branch mispredicts, although rare, mean that the CPU has to actually flush the entire pipeline of instructions and load the correct branch again - the minimum P4 mispredict penalty is over 20 clock cycles for data in the L1 cache. Pipeline bubbles, where the processor can't schedule two instructions to be executing simultaneously (or close to one another), result in a further loss because the bubble (i.e. an empty pipeline stage) propagates all the way down the chain. I imagine that the huge amount of work that the Willamette team did with the scheduling unit to stop this happening has moved right into Conroe.
As far as I know, instructions can't go down the pipeline without being completed - however, the above flush means that some must be removed before they can finish and others put in their place.
I hope this doesn't sound like I'm criticising the article, because it does a very good job of explaining the architecture :)
Basically if you get a pipeline stall, for whatever reason, it has to flush part/all of it and start again. Long pipeline = massive performance hit. Longer pipeline = lower IPC but higher mhz because netburst was all about sheduling and keeping a large flow of data ready from high speed, long latency devises like RDRAM.
For example, if you have a 31-stage pipeline, as you do in the Pentium 4 Prescot, you can do 31 operations on an instruction in one clock cycle.
In essence, 31 operations are not performed on a single instruction in one cycle - 31 operations are performed on 31 different instructions in one cycle, each of which is in a different stage of execution. That means that they'll still roll off that particular pipeline at a rate of one per clock, but higher clock speeds can be hit more easily since less work - which takes less time - is performed on that instruction (and the 30 others) in one cycle.
It stops the information from cycling through the processor again and again and again ultimately; netburst is all about rapidly chewing through the data it gets all at once, storing the important stuff, figuring out what needs to be done, and putting it through quickly, quickly, quickly; we just dont live in that type of a processing world.
/me wibbles himtan012 and his gift for extensive, in depth architecture talks ;)
this should be interesting when it comes out i can't wait to see some perfromance figures....the way i like to see this is so;say the P4 precotts for example were like american cars they have BIG V8 engines (long pipelines) that chew through huge amounts of fuel (data) but still manage to produce not very much horsepower or torque (IPC) but you can strap on superchargers and such like (highier clock speeds etc..) to get a bit more power out of them...whereas on the other hand the Core architecture will be more like a European car it will have a sensibly sized engine (14 stage pipline the ballence for both long and short instructions) will use a sensible amount of fuel (in this case power) but still manage to jump up and down all over the V8 and produce more power and torque (IPC) then you can dd little bits here and there like an ECU re-map (the prefecther tweaks etc..) to give you more power.
i know its a bit of a strange way to look at it but it does make it nice and understandable i think
Originally Posted by Meanmotion Well, the big question now is do AMD have an answer to Core? Or are they relying on there current lineup to compete for a while yet?
can you show us these PowePoint documents or whatever else you read for this article? :) I bet it would be interesting to see more pro-orientated form of this! / i can't belive i'm going to study this kind of **it after a year :D ...no really...286's way of working(which makes you understand all processors way of working) is in the program of my technical desciplines in 12th grade/
Is it true that Pentium M's microarchitecture is "closer" in design to Pentium 3's than it is to Pentium 4's?/one of the reasons its better than P4/
Originally Posted by nookie Is it true that Pentium M's microarchitecture is "closer" in design to Pentium 3's than it is to Pentium 4's?/one of the reasons its better than P4/
Yes, that is correct. The NetBurst architecture - that of the Pentium 4 - was all but scrapped and, while they're keeping some features of it (such as its FSB), the majority of the design is based on the Pentium 3 with large improvements made. NetBurst's pipeline was simply too long; although they could hit huge clock speeds with it, its effect on performance due to continual flushing and bubbles (see my posts above) was too great.
Comments 1 to 17 of 17
But apart from that, very informative. :)
Just one bit that isn't quite right as far as I know:
The disadvantage of a long pipeline - specifically, one as large as Prescott's - is that it hugely increases penalties for various inherent problems with the processor. Branch mispredicts, although rare, mean that the CPU has to actually flush the entire pipeline of instructions and load the correct branch again - the minimum P4 mispredict penalty is over 20 clock cycles for data in the L1 cache. Pipeline bubbles, where the processor can't schedule two instructions to be executing simultaneously (or close to one another), result in a further loss because the bubble (i.e. an empty pipeline stage) propagates all the way down the chain. I imagine that the huge amount of work that the Willamette team did with the scheduling unit to stop this happening has moved right into Conroe.
As far as I know, instructions can't go down the pipeline without being completed - however, the above flush means that some must be removed before they can finish and others put in their place.
I hope this doesn't sound like I'm criticising the article, because it does a very good job of explaining the architecture :)
No they are still 32/64bit, but can work with SSE(1/2/3) specific commands which are 128bits wide in total.
My head hurts
/me wibbles himtan012 and his gift for extensive, in depth architecture talks ;)
i know its a bit of a strange way to look at it but it does make it nice and understandable i think
Is it true that Pentium M's microarchitecture is "closer" in design to Pentium 3's than it is to Pentium 4's?/one of the reasons its better than P4/