bit-tech.net

Parallel Worlds

"But I don't want to go among mad people," Alice remarked.
"Oh, you can't help that," said the Cat: "we're ALL mad here. I'm mad, you're mad."
"How do you know I'm mad?" said Alice.
"You must be," replied the Cat. "Or you wouldn't have come here!"

Some days, it hits you out of the blue. You wake up, you get out of bed, reach for your fuzzy slippers and head down for the coffee and the morning paper when... BAM! Out of left field, your whole world has changed. The month of July was just one big, crazy day for us in the computer tech industry, as one industry-shaking bomb dropped after another. First Conroe, then news of Kentsfield, then the acquisition of one of the world's leading graphics companies.

"The month of July was just one big, crazy day"

Each of us at bit-tech has tried to work through all of this in our own way. Wil looks at it from a business standpoint, Tim from a hardware standpoint... I just stand there looking confused for a little, trying to put the pieces together. Reworked pipelines with shared cache? Quad cores? A chip company buying a graphics company? Somewhere, it all had to have a reason to come together how it did, and though I think Conroe is pretty amazing, I just couldn't think it that revolutionary.

And then Intel's little report on raytracing hopped across my desk like a little white rabbit with a pocketwatch, and I followed it right down into the rabbit hole. There it was, a parallel world that connected a lot of dots, some of which hadn't even been drawn yet.

As some of you already know, the idea of real-time raytracing has always been one of my pet-peeves for the industry. The concept is easy - rather than trying to approximate every single pixel's light value through myriad pipelines and shaders, you trace rays of light from eye to source using one physics calculation. This calculation takes lots into account based on what the light hits, but it is just one calculation that is repeated millions and millions of times per frame.

Rather than using uni-directional meshes for models, where only the outside counts as visible space (this is where clipping errors derive from), raytracing deals in volumes. Each time a ray of light hits a new volume, a new segment is created (dubbed a rayseg) for how light would react within (or on) that particular material. Since light is now allowed to pass through transparent objects, or is properly reflected off a solid surface, all light in a room traces back to its sources. It sounds easy enough! If only the implementation were as simple...

"Raster graphics research has continued to be milked for every approximate drop"

But rather than working on that advancement, most of the commercial graphics industry has been intent on pushing raster-based graphics as far as they could go. Research has been slow in raytracing, whereas raster graphic research has continued to be milked for every approximate drop it closely resembles being worth. Of course, it is to be expected that current technology be pushed, and it was a bit of a pipe dream to think that the whole industry should redesign itself over raytracing.

But if it's all so infeasible, then how did I end up here in this parallel world with the funky rabbit? Oh, yes, that's right.... July came and the whole industry went topsy-turvey. Dual core suddenly left the desks of enthusiasts, as Core 2 Duo jumped right into the laps of mainstream purchasers who longed for affordability. AMD slashed prices to compensate, driving the wonderful X2 prices down to what could only be called "cheap as chips." In the wake of the explosive Conroe launch, Intel let a new shark into the water with the details that quad core was not just the next concept, but coming to desktop level processors before the end of the year.

FOUR cores. Barely 18 months ago, HyperThreading was the closest we could think of to two CPUs unless we owned Xeon or Opteron boards. Kentsfield will be out before the end of the year with a price of $999, which is not much more expensive than AMD's current flagship, the FX-62. A desktop with abilities like this is bound to create a rabbit-sized hole in conventional industry thinking. Now what's someone going to do with all that power?

Back to top

Brett Thomas

Enough history for now. Back to my little raytracing parallel-processed world. Specifically, textures - which are part of how raytracing works its magic. As we mentioned, rather than dealing with solid tubes painted with textures, raytracing uses volumes. We assign each volume special properties like translucency, refractive index, opacity, and reflectiveness that are really just variables for the raytrace equation, telling exactly how the particular beam it's tracing should bend, the colour it should become, or otherwise act when it hits that volume, in full 3D space.

The best part? If an object isn't struck by any rays, it's not rendered and you get a normal black shadow, because pixels are coloured based on what rays of like strike them. Compare that to raster graphics, where everything must be rendered sequentially, then a Z-buffer determines what is or is not actually shown.

"Rather than dealing with solid tubes painted with textures, raytracing uses volumes"

The beauty of this digital artistry is how well it scales - unlike raster graphics, which lose considerable efficiency as they scale over multiple GPUs, raytracing has been found to scale almost 1:1 for each additional processor it gets. There are quite a few reasons for this, not least of which being that it doesn't have to load duplicate textures into memory on each card to try and sych frame buffers.

As you can see above, textures are totally redefined as just sets of numbers, instead of big image files that get wrapped around a frame. You can't get away with all surfaces being smooth, though, and it's far too exhausting to model every little nub on a leathery hide, so rather than dealing with the minutia, raytracing can use bump maps to apply more uniform texture to a volume. Of course, these are only 8-bit greyscales, so they're a lot smaller than the textures we use today, even when they are many times the in-game size.

Of course, no huge memory overheads means no need for those huge memory buffers on graphics cards. And because of the way that raytracing only repeats one big equation over and over, it means no need for 64 gazillion pixel pipes and shaders - unless, of course, those pipes can do a lot more than one purpose. With the advent of the Unified Shader Architecture (essentially just being FPU processors), translating a GPU into an RPU (Raytrace Processing Unit) would be a highly efficient use of parallelisation.

Suddenly, SLI and Crossfire become much more efficient concepts due to the scalar nature of raytracing. An off-die daughter board RPU/GPU is a great transition piece, but it does add much more travel length to each instance of a computation; a computation that isn't that complex, but needs to be done a whole lot. This could see the shape of necessary bandwidth changing from GPU memory to inter-bus linkage, boosting the speed of the SLI/CrossFire bridge and PCI-Express lanes until the computations become efficient enough to be done on the processor itself.

Speaking of transition, none of this means raster graphics would be dead, or that the technologies couldn't be developed to work simultaneously, particularly with ATI's Unified Shader Architecture. Currently, due to rayseg processing limits, we have to use textures like we're used to for a lot of graphics rendering, and only use raytracing for the light effects: we just can't get the polygon counts that high on our models.

Unified Shader Architecture is much more flexible than the pixel pipelines of old, and can really be an excellent bridge to do a little bit of everything - from RPU to shading to vertex management. The more detailed we can get our meshes to be and the more we work on raytracing, though, the less efficient the traditional GPU becomes - particularly high-end daughter cards.

So how long until all this could start pushing out of the theoretical and into the real? Well, Intel says that we're looking at needing about 450 million raysegs per second before we get 'interesting.' And since a single-core P4 at 3.2Ghz was capable of 100m raysegs/sec, that means we're looking at....

Oh, blast. Rabbit, I seem to have forgotten my pocketwatch. May I borrow yours?