Cambridge-based chip design specialist ARM has announced the launch of a new system intellectual property (IP) aimed at creating a new generation of many-core devices: the CoreLink CCN-504 cache coherent network.
The latest in ARM's announcements aimed at giving Intel cause for concern in the lucrative datacentre market - along with the server-friendly Cortex-A15 'Eagle' design and the introduction of the 64-bit ARMv8 architecture - the CoreLink CCN-504 design provides, as the name suggests, an on-chip interconnect for system-on-chip designs with up to 16 individual processing cores. Compared to current ARM chip designs, which top out at four general-purpose CPU cores alongside a lower-power 'companion core' as with Nvidia's Tegra 3 SoC design, that promises a major increase in processing density - a key point for adoption in the server market.
The move from single-core to multi-core processing is all but complete, with even the cheapest of smartphones and tablets shipping with at least a dual-core chip. In the datacentre, however, a new shift is taking place: multi-core to many-core. With companies like Adapteva boasting of 64-core co-processor designs which sip power, Intel itself pushing its own Many Integrated Core (MIC) products which pack upwards of 50 co-processors into a PCI Express card, and server-oriented central processors hitting 16 cores themselves, the focus of the industry is shifting from raw power to massive parallelism.
It's something in which all the major semiconductor industries are showing an interest: as well as Intel's MIC project, AMD recently spent a chunk of change buying microserver specialist SeaMicro following the company's heavily-publicised plans to produce compact, low-power servers packing 512 Intel Atom or ARM Cortex-A9 processing cores.
It's also a move that has significant drawbacks. Many applications are still not written with parallelism in mind, often failing to take advantage of the extra processing cores now available. Although that's less important for server environments, it's something that needs to be addressed as the many-core technologies trickle down from the datacentre to the desktop - and is the primary reason for the creation of the low-cost many-core Parallella development board.
Another issue, however, is processor interconnections: the more processor cores a chip has, the more difficult it is to keep each one fed with data and instructions. That's where ARM's CoreLink comes in: the company claims that it enables a fully-coherent high-performance interconnect between many-core CPUs and even GPUs, allowing each to access the cache of the other. The result, in theory, is a drastic drop-off in access requests for off-chip memory - the performance killer suffered by most many-core designs.
If the idea of a heterogeneous network with shared cache access sounds familiar, it should: that's exactly AMD's plan for its Fusion technology, rebranded since launch as the Heterogeneous Systems Architecture (HSA.) Under the HSA umbrella, AMD is planning to unite its accelerated processing unit (APU) and central processing unit (CPU) technologies in a way that should boost performance significantly - and, as ARM has shown, it's a move in which AMD won't be alone.
At the same time, ARM has announced that CoreLink is gaining the DMC-520 dynamic memory controller, which improves off-chip memory access speeds and includes support for DDR4 memory in a server-centric platform planned for release in 2013. ARM server specialist Calxeda has already announced an upcoming product using Cortex-A15 chips with CoreLink, along with semiconductor giant LSI.
Although it'll be a while before we start seeing 16-core smartphones, ARM's continued push for towards many-core computing should give Intel pause for thought - and with a planned launch date of 2013 for the first CoreLink IP-based products, the battle lines are clearly being drawn.