Nvidia has announced the release of its latest parallel programming platform, CUDA 6, bringing support for unified memory, drop-in libraries and scaling across multiple graphics processors.
Nvidia's CUDA 6 adds support for unified memory models, solving a major bottleneck with tasks that need to operate on both the CPU and the GPU.
CUDA, the Compute Unified Device Architecture, is Nvidia's secret source for general-purpose GPU (GPGPU) coding. A rival to the Khronos Group's OpenCL, CUDA offers the ability to write code which can be executed in parallel on Nvidia graphics processors - greatly accelerating parallelisable tasks, and making Nvidia GPUs a common choice for high-performance computing (HPC) and supercomputing projects. The latest TOP500 list of the world's most powerful supercomputers sees 38 of the 500 using Nvidia GPU-based accelerators, with just two using AMD's rival Radeon boards and 13 using Intel's Many Integrated Core (MIC) x86 Xeon Phi boards.
The biggest change in CUDA 6 is designed to simplified writing code that can run on both the GPU and CPU while boosting performance: unified memory. Under CUDA 6, Nvidia explains, applications can access both CPU and GPU memory without the need to transfer data from one to the other - addressing a major bottleneck that can cripple the performance of CUDA-based GPGPU applications. Rival AMD has shown a similar progression in its hUMA model
but CUDA 6 marks the first time Nvidia has supported it for GPGPU applications.
The new release also introduces drop-in libraries for basic linear algebra subprograms (BLAS) and fast Fourier transform (FFT), allowing developers to get a claimed eight-fold performance boost on these common calculation types simply by replacing their existing CPU-driven libraries with those provided in the SDK. The reason for the eight-fold figure? The new libraries allow for automatic performance scaling over up to eight GPUs in a single node - two for the FFTW library - offering, assuming you pick up Nvidia's top-end accelerator, up to nine teraflops of double-precision performance and support for workloads of up to 512GB.
The new CUDA toolkit is available to download as a release candidate now from the official website