bit-tech.net

AMD hUMA introduced: Heterogeneous Unified Memory Access

AMD hUMA introduced: Heterogeneous Unified Memory Access

AMD hUMA unites the CPU and GPU memory spaces into one.

AMD has provided a few more details about its upcoming Heterogeneous System architecture (HSA), revealing the name of the unified memory system it will be using: Heterogeneous Unified Memory Access (hUMA).

HSA is AMD's big vision for its future APUs. Like its existing APUs, HSA chips will feature a CPU and GPU on one piece of silicon but the big innovation with HSA is that the two units will now share memory directly, thus hUMA.

On current AMD and Intel APUs the CPU and GPU have separate memory blocks. So for the GPU to do some processing it requires the appropriate data to be copied from the CPU memory to the GPU memory, and back again once the processing is finished. This creates a severe bottleneck in performance and greatly increases complexity for programmers.

By unifying the two blocks of memory and allowing the CPU and GPU to directly access the same data the performance overhead of copying all the data is eliminated and programming complexity is greatly reduced.

AMD highlighted what it sees as the top ten benefits of HSA in a recent presentation:
  • Much easier for programmers
  • No need for special APIs
  • Move CPU multi-core algorithms to the GPU without recoding for absence of coherency
  • Allow finer grained data sharing than software coherency
  • Implement coherency once in hardware, rather than N times in different software stacks
  • Prevent hard to debug errors in application software
  • Operating systems prefer hardware coherency - they do not want the bug reports to the platform
  • Probe filters and directories will maintain power efficiency
  • Full coherency opens the doors to single source, native and managed code programming for heterogeneous platforms
  • Optimal architecture for heterogeneous computing on APUs and SOCs.

Some of these bullet points are clearly digs at Nvidia's current system for GPU accelerated programming, CUDA, which uses a software layer to interpret simple programmer input and automatically handle the complication of memory management (among other things). HSA shouldn't require this software layer.

AMD hUMA introduced: Heterogeneous Unified Memory Access

AMD hUMA introduced: Heterogeneous Unified Memory Access

AMD hUMA introduced: Heterogeneous Unified Memory Access

HUMA is essentially just a bit of branding that refers to the single memory address space the company's upcoming HSA APUs will be using. It harks back to the Unified Memory Access nomenclature of early multicore CPUs - where each CPU core started to share the same memory - adding in heterogeneous in reference to HSA.

HSA isn't just an AMD project, though, it is centred around the HSA Foundation "whose goal is to make it easy to program for parallel computing." The foundation includes such other high profile members as ARM, Qualcomm and Samsung.

The arrival of HSA is still some way off, with the first AMD chips set to use the architecture expected to arrive early next year. However, the PlayStation 4 is expected to feature an HSA type processor, so we'll see some indication of what we can look forward to when that console arrives in Q4 this year.

14 Comments

Discuss in the forums Reply
bowman 30th April 2013, 10:52 Quote
This is what the promise of Fusion really is. Putting a CPU and a GPU in the same MCM, or heck, even on the same die, is not revolutionary, and hardly even evolutionary. Thus far it has worked mostly as a cost-saving measure. The system architecture is still the same as a regular computer with separate CPU and GPU.

Now we're talking, though. Unfortunately this is more likely to mean the GPU will be a low-end one shackled with the inadequacies of DDR3 memory, rather than the amazing opportunity of letting a CPU and GPU share some horrendously fast GDDR5 memory.

Oh well, at least AMD will implement that in the silly Sony box.
mi1ez 30th April 2013, 11:00 Quote
hUMAn after all...
will_123 30th April 2013, 11:10 Quote
devug..? Little typo.
Meanmotion 30th April 2013, 11:11 Quote
Quote:
Originally Posted by will_123
devug..? Little typo.

Ta, fixed.
will_123 30th April 2013, 11:13 Quote
Quote:
Now we're talking, though. Unfortunately this is more likely to mean the GPU will be a low-end one shackled with the inadequacies of DDR3 memory, rather than the amazing opportunity of letting a CPU and GPU share some horrendously fast GDDR5 memory.

Also judging by the PS4 specs will this really be the case...Its got 8GB of DDR5 unified memory?
azazel1024 30th April 2013, 14:46 Quote
Nice.

Though, unless I greatly misunderstand it, Haswell brings unified CPU/GPU memory to Intel chips in...uh...a month. So, another "Intel beating AMD to the market" thingie.
mi1ez 30th April 2013, 14:59 Quote
Quote:
Originally Posted by azazel1024
So, another "Intel beating AMD to the market" thingie.
Haven't AMD been first with most recent CPU techs?
schmidtbag 30th April 2013, 15:12 Quote
This is pretty fantastic IMO, it will give AMD a considerable performance gain and, as stated earlier, makes the term Fusion much more true. What I find interesting about this is you could potentially have several GB of memory go toward the GPU. With a little overclocking, this could probably easily handle 6 monitors that aren't doing anything GPU intensive (such as HD video or 3D). If you want a multi-seat office or school computer, this would be very ideal. Many people overestimate the needs of office computers.
Bindibadgi 30th April 2013, 16:35 Quote
Quote:
Originally Posted by azazel1024
Nice.

Though, unless I greatly misunderstand it, Haswell brings unified CPU/GPU memory to Intel chips in...uh...a month. So, another "Intel beating AMD to the market" thingie.

Technically AMD already have it in the PS3 and probably Xbox 720/Next/whatever too. It may not be on the consumer market but the tech is there and working.
SAimNE 30th April 2013, 20:42 Quote
Quote:
Originally Posted by will_123
Also judging by the PS4 specs will this really be the case...Its got 8GB of DDR5 unified memory?

gddr5 isnt actually "faster" than ddr3. it is just optimized for graphics(pretty sure it handles higher volume transfers better at the sacrifice for a bit of added latency, but i could be wrong, havent looked too far into it) anyway if they make it GDDR5 mem the processor side of things will suffer, while the graphics would improve... so best outcome would probably just be DDR4 coming out in time for the apus.
Adnoctum 1st May 2013, 10:54 Quote
Quote:
Originally Posted by bowman

Unfortunately this is more likely to mean the GPU will be a low-end one shackled with the inadequacies of DDR3 memory, rather than the amazing opportunity of letting a CPU and GPU share some horrendously fast GDDR5 memory.

GDDR5 isn't better than DDR3, it IS DDR3 but optimised for the parallel tasks of GPUs. GDDR5 has high bandwidth because it can have multiple (high latency/high bandwidth) controllers per channel (while also reading AND writing during the cycle) while DDR3 has a single (low latency/low bandwidth) controller per channel (and can only read OR write during the cycle).

CPUs want DDR3 because they prefer low latency, as they have multiple workloads all needing access quickly so as to not hold up the current thread.
GPUs want GDDR5 because they want high bandwidth, and care less about latency because they need to move a lot of data but it is less time critical.

These are two competing requirements. On the desktop you'll want DDR3 because you will have multiple workloads running simultaneously. Consoles such as the PS4 will be able to get away with GDDR5 because it will be undertaking a single workload that will be mainly GPU-related for which GDDR5 will suffice.

It should be noted that it is not GDDR5 that has high latency but the controllers themselves, as high bandwidth and low latency are competing requirements.
Low latency GDDR5 controllers should be do-able, it is just that it hasn't been needed for past/current/future AMD/nVidia GPUs which require high bandwidth. Perhaps a controller for APUs that can switch high bandwidth/low latency modes is the answer.
jb0 1st May 2013, 14:37 Quote
GDDR5, DDR3... it's still all DRAM. Slow, power-hungry, complex-to-interface DRAM.

Wake me when we're using SRAM for more than cache again.
yougotkicked 2nd May 2013, 03:23 Quote
If the integrated GPU has any real number crunching power to it, this could be a huge deal. It won't need to be a GTX Titan, something with a few hundred compute cores that can significantly outperform a CPU for basic parallel tasks would do the trick. I can imagine computing clusters with a dozen APU's per blade server, offering huge throughput with relatively low power demands.

Of course, this is all marketing BS if the integrated GPU isn't big enough. Careful programming can mitigate the data transfer overhead, which isn't so bad if you don't need to constantly load new gigabyte-scale blocks of data onto the GPU (bear in mind that the 'bottleneck' is the PCI-E bus, nothing compared to the bus between the CPU and RAM, but it's not like we're moving 10 gigs onto a USB drive).

I sure would love it if this turns out to be as good as it sounds: over the summer I'll be teaching researchers how to do GPGPU computing, and eliminating the data transfer step would make things way simpler when coding.
will_123 2nd May 2013, 10:18 Quote
Quote:
Originally Posted by SAimNE
gddr5 isnt actually "faster" than ddr3. it is just optimized for graphics(pretty sure it handles higher volume transfers better at the sacrifice for a bit of added latency, but i could be wrong, havent looked too far into it) anyway if they make it GDDR5 mem the processor side of things will suffer, while the graphics would improve... so best outcome would probably just be DDR4 coming out in time for the apus.

Didn’t say it was faster just said that the Playstation 4 specs showed unified memory.
Log in

You are not logged in, please login with your forum account below. If you don't already have an account please register to start contributing.



Discuss in the forums