bit-tech.net

Researchers boost graphics performance through processing in-memory

Researchers boost graphics performance through processing in-memory

A team of researchers led by Shuaiwen Leon Song, pictured, have developed in-memory processing systems which they claim can boost graphics performance by up to 65 percent.

Researchers at the Pacific Northwest National Laboratory and the University of Houston have announced a technique for carrying out graphics processing in-memory which is claimed to boost performance by up to 65 percent.

Traditionally, graphics processing is done by - unsurprisingly - a graphics processor, which takes data from memory, works on it, then places it back into memory again. A team of researchers led by Shuaiwen Leon Song and working in the field of high-performance computing (HPC) under a US Department of Energy (DoE) grant, though, have opted for a different method: a means of allowing the memory to do the processing directly, bypassing the need for the GPU to spend time reading and writing data.

Dubbed 'processing in memory', the team's work is based on the increasingly common 3D stacked memory modules available on high-end graphics hardware. In addition to the usual layers of memory Song's team added logic layers able to work directly on the stored data, effectively turning each memory chip into a co-processor. Although the capabilities of the logic layer are limited compared to the far larger GPU, it was enough to see considerable improvements: By offloading anisotropic filtering to the modified memory processors the performance of tested games was boosted by up to 65 percent.

'We're pushing the boundaries of what hardware can do,' said Song of his team's work, 'and though we tested our idea on video games, this improvement ultimately benefits science.'

Before getting too excited, though, it's worth looking to the games Song chose to test out his team's creation: Doom 3 and Half-Life 2, titles launched 13 years ago and which are nowhere near stressing out even the cheapest of modern graphics cards. Whether Song's work will translate to improvements in more demanding titles - or will even ever be picked up for commercialisation in gaming hardware, rather than being left as the preserve of high-performance computing systems - remains to be seen.

Song presented his work at the 2017 IEEE Symposium on High Performance Computer Architecture, but the paper has not yet been made public.

14 Comments

Discuss in the forums Reply
SinxarKnights 21st April 2017, 11:27 Quote
Pretty sweet stuff. Seems like bandwidth would be the limiting factor as is with current GPUs. Then again I don't know what they are doing exactly. I need to check out the paper when it is released and see how they did it.

Can you keep us posted about this Gareth?
perplekks45 21st April 2017, 12:35 Quote
Seconded.
IamSoulRider 21st April 2017, 13:00 Quote
"the team's work is based on the increasingly common 3D stacked memory modules available on high-end graphics hardware."

I'd assume that would be HBM2, possibly first gen HBM. In that case Memory Bandwidth should be High.

Do you see what I did there? :P
SinxarKnights 21st April 2017, 13:14 Quote
Doesn't answer the question though. I imagine even HBM3 would be a significant bottleneck processing instructions in memory instead of directly on die.

But like I said, I don't know what exactly they are doing. Need that paper to check it out.
edzieba 21st April 2017, 14:03 Quote
Quote:
Originally Posted by SinxarKnights
Pretty sweet stuff. Seems like bandwidth would be the limiting factor as is with current GPUs. Then again I don't know what they are doing exactly. I need to check out the paper when it is released and see how they did it.
Other way around: this would alleviate memory-bandwidth-limited operations (i.e. those that nee to operate on a lot of data, but the operations themselves are very basic) by pushing those operations out to the memory itself, so that data never needs to cross the memory bus in the first place.
Wakka 21st April 2017, 15:29 Quote
I'm nowhere near smart enough to know how this stuff works in detail, but how would memory chips process that kind of data? I mean, they're memory chips - surely they are designed to either store something, or pass it along to a smarter chip?

Wouldn't you be a bit pissed if you were an nvidia or AMD engineer and someone came along and was like "we can make things faster by moving instructions OFF those fancy multi-billion transistor GPU's!"...
edzieba 21st April 2017, 15:35 Quote
Quote:
Originally Posted by Wakka
I'm nowhere near smart enough to know how this stuff works in detail, but how would memory chips process that kind of data? I mean, they're memory chips - surely they are designed to either store something, or pass it along to a smarter chip?

Wouldn't you be a bit pissed if you were an nvidia or AMD engineer and someone came along and was like "we can make things faster by moving instructions OFF those fancy multi-billion transistor GPU's!"...
The storage dies themselves are 'just memory' But for HBM stacks to work at all, the bottom element in the stack is a processing die to handle interface between the memory dies and the memory bus. What the researchers have done is to augment the existing processing die to allow it to do basic computations on the memory traffic it is already handling.
Corky42 21st April 2017, 17:10 Quote
@Wakka...
Quote:
Originally Posted by The Article
Song's team added logic layers able to work directly on the stored data, effectively turning each memory chip into a co-processor. Although the capabilities of the logic layer are limited compared to the far larger GPU, it was enough to see considerable improvements: By offloading anisotropic filtering to the modified memory processors the performance of tested games was boosted by up to 65 percent.
Basically they added a small ASIC that they could send a command to that said something like perform anisotropic filtering on data held in memory at X location.
Cr@1g 21st April 2017, 17:58 Quote
Quote:
Originally Posted by edzieba
Quote:
Originally Posted by SinxarKnights
Pretty sweet stuff. Seems like bandwidth would be the limiting factor as is with current GPUs. Then again I don't know what they are doing exactly. I need to check out the paper when it is released and see how they did it.
Other way around: this would alleviate memory-bandwidth-limited operations (i.e. those that nee to operate on a lot of data, but the operations themselves are very basic) by pushing those operations out to the memory itself, so that data never needs to cross the memory bus in the first place.

Im wondering if AMD's new approach with HBM2 along with ts HBC creates a 512TB virtual address space and is made for this way of thinking?
Gareth Halfacree 21st April 2017, 20:20 Quote
To clear up some misconceptions in the comments - and apologies if the article was unclear:

The technique works, as Ed and Corky have both mentioned, by adding a processing element to each memory stack which is capable of working directly on data stored in said memory. So, instead of the GPU having to read 8GB (or whatever) of data, do its thing, and write the 8GB back again, the processing happens on the memory directly - hence 'in-memory processing.' It's not a new idea, but it's the first time I've seen it applied to graphics processing with practical results.

As Ed mentioned, it's the exact opposite of bandwidth-dependent: the data doesn't go anywhere, so where the GPU can only work on the contents of the memory at the throughput of the memory bus the in-memory processing system can operate however quickly the memory itself works at - and in parallel, too, meaning if you've got eight stacks of memory you can do your processing eight times faster than if you had one stack of memory without worrying about saturating any buses.

You're limited in what you can do, though: the die space and power envelope for adding a logic layer to stacked memory are both way, way smaller than for a GPU - so you can't have anything general-purpose going on there. Hence the proof-of-concept: a logic layer that only does one thing, anisotropic filtering - something which is fairly simple computationally but that requires massive memory bandwidth. With that tiny bit of extra processing power, you're lightening the load on the GPU by a percent or two at most - but because you're no longer bottlenecked by the memory bus you're increasing the performance by 65 percent.

Step one of commercialisation: task offload acceleration, by adding anisotropic filtering logic to the memory stacks (or whatever task ends up making sense to offload - there may be something else that would give even bigger performance increases in modern gaming engines.)

Step two: add more logic layers. As well as your anisotropic filter logic layer, why not stick a - I don't know - bump-mapping layer on there? Keep adding layers until you can't fit any more on there.

Step ???: by now your GPU is basically just there to tell the memory stacks what they should be doing, so you've effectively created a fundamentally new architecture. Instead of an ultra-powerful GPU talking to dumb memory, your graphics card is now a dumb and lightweight central controller talking to ultra-powerful in-memory processors. Likely? Who knows; the technique has to survive the prior steps first.

As for the paper, I'll drop the guys an email and see if there's a timescale on public access - or, given that it's DoE funded, whether it'll ever be publicly accessible.
greigaitken 22nd April 2017, 19:02 Quote
@ GH
you just don't get this kind of analysis on the bbc tech section?
Gareth Halfacree 22nd April 2017, 19:43 Quote
Quote:
Originally Posted by greigaitken
@ GH
you just don't get this kind of analysis on the bbc tech section?

I'm Ron Burgundy?
Gareth Halfacree 28th April 2017, 14:39 Quote
Got in touch with Shuaiwen Leon Song at PNNL, and he's out travelling this week but he's going to swing back around next with with as much additional information as he can gather. Should be interesting!
perplekks45 29th April 2017, 07:12 Quote
Cheers, Gareth. Much appreciated! ;)
Log in

You are not logged in, please login with your forum account below. If you don't already have an account please register to start contributing.



Discuss in the forums