Nvidia has formally announced details of its first 64-bit ARM-based system-on-chip (SoC) processor, the Denver-based Tegra K1, at the Hot Chips conference this week.
Nvidia's Tegra K1 chip, while only dual-core, is claimed to blow the competition out of the water in both single- and multi-threaded performance.
When Nvidia first announced the latest-generation Tegra K1 processor, it did so with the promise that there would be two versions confusingly bearing the same name. The first is the standard Tegra K1, featuring four 32-bit Cortex-A15 ARMv7 processing cores plus a fifth low-power 'Shadow Core,' boasting a three-way superscalar design, clock speeds of up to 2.3GHz and two chunks of 32KB L1 cache. The second is based on two 64-bit ARMv8 cores, codenamed 'Denver,' with a seven-way superscalar design, clock speeds of up to 2.5GHz, and 128KB plus 64KB L1 cache.
Thus far, only the former has hit the market, but at Hot Chips last night Nvidia promised the latter is on the way. 'This new version of Tegra K1 pairs our 192-core Kepler architecture-based GPU with our own custom-designed, 64-bit, dual-core “Project Denver” CPU, which is fully ARMv8 architecture compatible,
' Nvidia's Nick Stam explained in a blog post
posted last night. 'Further, Denver is fully pin compatible with the 32-bit Tegra K1 for ease of implementation and faster time to market.
'Denver is designed for the highest single-core CPU throughput, and also delivers industry-leading dual-core performance. Each of the two Denver cores implements a 7-way superscalar microarchitecture (up to 7 concurrent micro-ops can be executed per clock), and includes a 128KB 4-way L1 instruction cache, a 64KB 4-way L1 data cache, and a 2MB 16-way L2 cache, which services both cores.
Nvidia claims that the design of Denver includes tricks that will bring its performance closer to that of a desktop processor without sacrificing power envelope restraints. Features like dynamoc code optimisation - 'which optimises frequently used software routines at runtime into dense, highly tuned microcode-equivalent routines [...] stored in a dedicated, 128MB main-memory-based optimisation cache
' - and its seven-wide superscalar design will, the company has claimed, offer greater performance for both single- and multi-threaded applications than existing four- and eight-core processors, including the company's own 32-bit Tegra K1.
As to when the chips will hit the market, however, Nvidia is being coy. 'Look forward later this year to some amazing mobile devices based on the 64-bit Tegra K1 from our partners,
' Stam teased, 'and for hard-core Android fans, take note that we’re already developing the next version of Android – “L” – on the 64-bit Tegra K1.