bit-tech.net

Intel's Haswell rumoured to feature L4 cache

Intel's Haswell rumoured to feature L4 cache

Intel's upcoming Haswell architecture is rumoured to be getting an L4 cache layer, for improved graphics performance.

Intel's upcoming Haswell architecture, designed to follow on from Ivy Bridge, is rumoured to include L4 cache which can be shared between central processing and graphics processing units for vastly improved 3D performance.

While as-yet unannounced by Intel itself, the claim comes from VR-Zone which claims to have spotted evidence that the Haswell launch line-up will include desktop chips featuring integrated graphics some two or three times faster than the best Ivy Bridge has to offer.

The biggest change, however, comes in the form of a new layer of cache memory. In addition to the usual L1, L2 and L3 cache layers, VR-Zone claims Haswell is to ship with a chunk of L4 cache. Traditionally, such a feature is limited to expensive chips aimed at the high-performance computing (HPC) market.

While Intel certainly has intentions in this area, Haswell's L4 cache is there for a different reason: graphics. Like AMD, its biggest competitor, Intel believes that increased coherency between the graphics and central processing infrastructure on chips is the way forward. While not quite as clear a roadmap as AMD's heterogeneous systems architecture (HSA), Intel's apparent move to add an L4 cache layer to Haswell indicates a similar goal: improved cache coherency between GPU and CPU tasks.

For consumer applications, that means vastly improved graphical capabilities which could mean the death of low- and potentially even mid-range dedicated graphics acceleration hardware. For the server room it means the ability to execute instructions on either the CPU or the GPU independent of where the data is stored, eliminating one of the biggest bottlenecks in general purpose GPU (GPGPU) programming - moving data.

Details of the Haswell chip design are sketchy, and Intel isn't talking. As per usual, the company merely stated that it 'refuses to comment on rumour or speculation regarding unannounced products.' Should VR-Zone's guess prove accurate, however, Intel's integrated graphics could be getting a serious speed boost come Haswell's launch in 2013.

13 Comments

Discuss in the forums Reply
CAT-THE-FIFTH 19th March 2012, 12:31 Quote
It seems that CD from SA mentioned this a while back(not sure if the numbers are correct though):

http://semiaccurate.com/2012/02/08/haswell-is-a-graphics-monster/

http://semiaccurate.com/2011/09/21/analysis-intel-shows-off-haswell-minus-the-important-bits/

AMD is also investigating used of stacked RAM too(there are pictures):

http://semiaccurate.com/2011/10/27/amd-far-future-prototype-gpu-pictured/

http://semiaccurate.com/forums/showpost.php?p=140928&postcount=26

2013 is going to be a very interesting year for integrated graphics as both Intel and AMD might be using on-die RAM!
Adnoctum 19th March 2012, 12:33 Quote
The Bobcat-based APUs are already being looked at for their GPGPU uses in servers, and I would imagine that APUs will graduate to the high end eventually as well, so Intel would be foolish if they weren't prepared to offer competition to AMD.

My only issue would be to question Intel's competence in creating GPGPU-capable graphics hardware. And their driver team hasn't been exactly stellar in trying to make up for hardware inadequacies either.
It is one thing to play HD video and some not very demanding games on Intel's graphics, but I wouldn't like to crunch important 1s and 0s on it. Hopefully Intel will surprise us and turn their graphics around.
greigaitken 19th March 2012, 12:34 Quote
that more intel and amd press forward with integrated - the more nvidia have to look behind and run faster.
hurry up the lot of you, i got 30 years gaming left and i want photo realism before that!
schmidtbag 19th March 2012, 13:50 Quote
5 years later:
"in other news, intel releases a L8 cache with 1GB of memory"

does intel even remember the purpose of on-die caches? i suppose its fine that the L4 is shared with the gpu, but still, i don't see it making a performance improvement on the cpu side. just make an L2 for the gpu.
yougotkicked 19th March 2012, 21:02 Quote
@schmidtbag: If you were to slap 1GB of cache (any level) on a current processor, it could probably break the TeraFLOP barrier with two cores tied behind it's FSB.

adding more levels and larger quantities of Cache memory is actually one of the most powerful ways to improve processor performance; it's also expensive as hell.

cache memory is quite literally 100 times faster than RAM, the CPU can access data from L1 cache in a single clock cycle, fetching from main memory can take up to 100 clock cycles. L2 takes about 15 cycles and L3 about 30. fetching a single datum from main memory during a calculation can easily double the execution time of an operation, AND potentially slow down other operations. adding more levels of cache helps avoid these time-wasting calls to main memory, even a call to L4 cache taking ~50 cycles is 50% faster than a call to main memory.

The reason we don't have uber-tons of cache memory is that it is stupidly expensive, like $50 per Mb.

The idea of it being specialized to graphics makes me think it may just be a stepping stone to a full L4 cache implementation. SATA's successor is supposed to communicate over the PCI-E bus that graphics cards use to communicate with the CPU, and Intel has made a habit recently of clever production implementations (see their "tic-tock" strategy). L4 cache (and eventually L5) is more or less inevitable and CPU's continue to advance much faster than RAM, so I think this theory has some merit.

[/longwindedpost]
schmidtbag 19th March 2012, 22:44 Quote
Quote:
Originally Posted by yougotkicked
@schmidtbag: If you were to slap 1GB of cache (any level) on a current processor, it could probably break the TeraFLOP barrier with two cores tied behind it's FSB.

adding more levels and larger quantities of Cache memory is actually one of the most powerful ways to improve processor performance; it's also expensive as hell.

cache memory is quite literally 100 times faster than RAM, the CPU can access data from L1 cache in a single clock cycle, fetching from main memory can take up to 100 clock cycles. L2 takes about 15 cycles and L3 about 30. fetching a single datum from main memory during a calculation can easily double the execution time of an operation, AND potentially slow down other operations. adding more levels of cache helps avoid these time-wasting calls to main memory, even a call to L4 cache taking ~50 cycles is 50% faster than a call to main memory.

The reason we don't have uber-tons of cache memory is that it is stupidly expensive, like $50 per Mb.

The idea of it being specialized to graphics makes me think it may just be a stepping stone to a full L4 cache implementation. SATA's successor is supposed to communicate over the PCI-E bus that graphics cards use to communicate with the CPU, and Intel has made a habit recently of clever production implementations (see their "tic-tock" strategy). L4 cache (and eventually L5) is more or less inevitable and CPU's continue to advance much faster than RAM, so I think this theory has some merit.

[/longwindedpost]

yes, do you know WHY cache is so much faster than ram? because it's on the same die as the cpu and its small, so its easier to address and find instructions while there's little to no latency problem. caches were made specifically because of these 2 benefits. there's a reason why L1 caches are still generally very tiny. if you've ever run memtest, you'd find that the smaller the cache, the faster it is. however, if you make a cache too small, it can't store complex instructions, hence stuff like L2 caches being notably larger. i made the joke about a 1gb L8 cache because if for some stupid reason that ever happens, it'll be as slow as regular ram, minus the latency times and would make ram obsolete.

look at it in terms of the common cache and ram analogy:
imagine you walk into a library looking for a book. the front door of the library is the CPU. the book shelves are RAM. you would have to walk all way way to the correct bookshelf and then pinpoint the location of that book within that shelf, then walk all the way back to the front desk to check it out and leave the library.
however, maybe you're looking for a few popular books. these books would already be placed at the front desk (the cache). this prevents you from having to walk to each book shelf and scan thru every book.

now if you take this analogy and put some gigantic cache like 1gb, that's basically saying that all you're doing is taking those book shelves and putting them really close to the front desk. you might not need to walk as far, but you still waste time searching for what you want.
iwod 20th March 2012, 03:31 Quote
Once Intel has Stacked Silicon done, ( it was actually announced nearly a decade ago ), they could move 10s to 100MBs of L4 Cache as another layer of CPU which could make GPU many times faster. Look at what the 36eDRAM brings to original Xbox.
yougotkicked 20th March 2012, 06:04 Quote
@schmidtbag: not trying to be insulting or anything, just saw what I felt was a misinformed statement and took the opportunity to explain some of the finer points of cache architecture for everyone.

and; not to be a smart ass, but cache is faster b/c it's made with static RAM, which requires 6 transistors for every bit of data, whereas DRAM requites only one. And the capacity of a memory device is fully independent of it's access times, since a fetch operation is an indexed table jump, not a scan. the size DOES effect the cost, and the staggered caching architecture present in ALL digital storage devices is nothing but a cost-saving design meant to virtualize a higher performance memory device through the optimization of several lower cost devices.
schmidtbag 20th March 2012, 14:41 Quote
Quote:
Originally Posted by yougotkicked
@schmidtbag: not trying to be insulting or anything, just saw what I felt was a misinformed statement and took the opportunity to explain some of the finer points of cache architecture for everyone.

and; not to be a smart ass, but cache is faster b/c it's made with static RAM, which requires 6 transistors for every bit of data, whereas DRAM requites only one. And the capacity of a memory device is fully independent of it's access times, since a fetch operation is an indexed table jump, not a scan. the size DOES effect the cost, and the staggered caching architecture present in ALL digital storage devices is nothing but a cost-saving design meant to virtualize a higher performance memory device through the optimization of several lower cost devices.

i never said cache wasn't made with static ram, and i know that its more than just simply a small local ram source on the cpu. i'm not really sure what your point is about that sentence about memory capacity; it concluded nothing. i never said size doesn't affect cost - i'm aware cache is more expensive, but larger caches these days don't bring up the price THAT much, so by now you'd probably end up seeing server CPUs with over 20MB if it were practical and proven worth it.

all i was trying to say in my post was that caches are deliberately small (for more than just price differences) and when they are smaller, they're generally faster. yes, i'm aware that 1gb of cache is not going to perform like 1gb of ram even without all the technicalities, but my point is it'll still run relatively slow. i'm not the one who came up with the library analogy, its been used for a long time.

i don't appreciate being called "misinformed" when you said nothing to prove that. the information i didn't supply (which you did) wouldn't have changed my point at all, in fact, all you really did was help prove my point even further.
technogiant 20th March 2012, 16:01 Quote
Well you can call me misinformed, because compared to you guys I am, but I don't really see the point of adding more and more layers of shared cache, once you have one level of shared cache just increase its size rather than adding another layer ( with slower latency)...surely?
rocket 20th March 2012, 19:44 Quote
If you remove ram And use a cache it would be more like a System on a chip
yougotkicked 20th March 2012, 22:12 Quote
@schmidtbag; I didn't intend to say that you WERE misinformed, i was simply saying that when I made my first reply to you, I thought your first post sounded misinformed. after your second post I realized your understanding of the concept went much deeper than I initially realized. I don't want to turn this into an argument, but it is hard to disagree with someone without sounding at least a little condescending.

I'll say this; I realize my first post came off a bit insulting, I did not intend for it to do so but I apologize for it nonetheless. We both obviously know a lot about the subject, which is complicated enough for us to go back and forth like this for pages.

@technogiant; as I understnd it, the main reason (but not the only reason) for having multiple layers of cache is that higher levels of cache are accessed and written by more system processes and elements. because of this there are protocols in place for managing which components can do what with that ram pool. on current generation i7 processors, the L3 cache is accessed by all the physical processing cores, so every time one of the cores wants to put some data there, a calculation has to be done to decide if it can be stored there and if so where it can go. at L2 and L1 the cache is exclusive to a single core, and these calculations do not need to be done.

besides that there are some complicated reasons for having two layers of cache dedicated to the same core; it's mostly to do with how data locations are accessed. every piece of storage on your computer is part of a "memory hierarchy" where each device is serving as a sort of cache for the slower, larger, and cheaper device below it. this is not only less expensive for the same capacity as using a single device, it is theoretically almost as fast. so a slight loss in performance can drastically reduce the cost. the more layers you add to the memory hierarchy, the smaller the performance gap becomes. that's why people have 100gb SSD's and 2TB hard drives, by putting the right data in the right places, you computer will be 80% as fast as it would be with a 2.1TB SSD, for a fraction of the cost. now if you put some of the savings into more/faster RAM, suddenly we're looking at 90% as fast. put the rest of the savings into a better CPU and you computer is now, in many ways, FASTER than if you put all your money into 2.1TB worth of SSD's. It's all about balance.
Gareth Halfacree 21st March 2012, 07:58 Quote
Quote:
Originally Posted by schmidtbag
i never said size doesn't affect cost - i'm aware cache is more expensive, but larger caches these days don't bring up the price THAT much, so by now you'd probably end up seeing server CPUs with over 20MB if it were practical and proven worth it.
We are: take the Intel Xeon X7560 for example, which has a whopping 24MB of cache. As does the Xeon E7-4830. Granted, AMD's equivalent chips only have 16MB - but they're typically used in dual- or quad-socket motherboards, meaning a grand total of 32MB or 64MB of cache.
Log in

You are not logged in, please login with your forum account below. If you don't already have an account please register to start contributing.



Discuss in the forums