As with RV770, the 64 VLIW (very long instruction word) shader units are split down into eight SIMD cores, with each one sporting eight shader units (40 stream processors) along with its own thread sequencers and arbiters, local 16KB data store and texture unit which, incidentally, also has its own dedicated L1 texture cache hidden behind the data request bus.
When you add up all of the compute power on tap in RV730, you end up with a number somewhere near 480 gigaFLOPS at the Radeon HD 4670's 750MHz core clock speed. That's pretty lofty for a chip that has a surface area of just 146mm².
What's more, when you consider that RV635 (the chip behind the Radeon HD 3600 series) featured 120 stream processors, eight texture processors and a transistor count of 384 million in a die that measured 118mm², it's clear how big those optimisations we discussed here
are – a 267 percent increase in shader horsepower for a 23.7 percent increase in die size... on the same process technology. That's pretty impressive with all things considered.
RV730's flow diagram
Moving onto the texture units, they have exactly the same functionality per SIMD as those in RV770, meaning there are four address, 16 FP32 samples and four FP32 filters per clock cycle, per SIMD core. In more traditional terms, this equates to 32 texture processors or 24 gigatexels per second of texture filling horsepower – a four fold increase in texture processor count and, thanks to the 25MHz higher core speed, a 4.14x improvement over RV635's 5.8 gigatexels per second texture fillrate.
Interestingly though, we were unable to verify this texture fillrate with any of the tools we've got on hand - it looks like there are either only 16 texture filters per clock, chip wide, or there are only 16 interpolators. Every piece of documentation we've seen from AMD suggests 32 texture units, and we're not alone in questioning this point.
The ROP units are also the same as those found in RV770 and each has its own L2 cache and independent 64-bit memory controller. There are two of them in RV730 and in some situations they're twice as fast as the Radeon HD 3000 series ROP units. What's more, RV635 only had one ROP unit, capable of four colour writes and eight Z-only writes per clock.
Click to enlarge
Combining all of this together nets you a four fold increase over the previous generation's ROP throughput in some scenarios – specifically, you're looking at significant gains with multi-sample anti-aliasing enabled and even in 64-bit colour modes when AA is disabled. AMD says you’ll also see double the performance with Z-only tasks too, with up to 32 operations per clock.
Add to this the fact that AMD has also fixed
the broken hardware-based MSAA resolve unit in the new ROPs and you've got the potential for some pretty decent anti-aliasing performance—even despite the fact the GPU only has a 128-bit memory interface. Of course, it's unlikely to match the Radeon HD 4800 series' muscle in this department, so we're not going to see fantastic performance at 8xAA because of the lack of bandwidth available (it's just 32GB/sec, for what it's worth), but we might see 4xAA become playable in certain scenarios.
Aside from graphics, RV730 boasts AMD's updated UVD engine, which includes better DVD upscaling support, dynamic contrast adjustments and dual-stream decode capabilities – this puts it on a par with Nvidia's latest PureVideo engine. There's also the now typical audio-over-HDMI support embedded into the GPU as well, meaning it's possible to pass up to 7.1 channel audio across the DVI port if the DVI-to-HDMI converter is connected.
It's about time we had a look at the card...