Researchers at the Singapore University of Technology and Design and the Data Storage Institute have developed a new deduplication algorithm designed specifically for byte-addressable Non Volatile Memory (NVM) devices and which can increase performance twofold while offering more storage capacity.
Popular in enterprise and backup circles, deduplication is the process of finding duplicate files or blocks on a filesystem and consolidating down to a single copy. At its simplest, deduplication can be the manual process of finding duplicate files and deleting the excess; automatic systems, meanwhile, can work at the file level by creating hard links to give the appearance of multiple copies while storing only one actual copy on the disk or at the filesystem level to find and link duplicated blocks even when they occur in otherwise identical files. For workloads where duplicated data is to be expected - especially backups - the gains can be immense.
Researchers from the Singapore University of Technology and Design and the national Data Storage Institute, though, have found that traditional deduplication algorithms don't play well with modern byte-, rather than word-, addressable Non Volatile Memory (NVM) solid-state storage. 'We have observed severe performance degradations when implementing a state-of-the-art inline deduplication algorithm in an NVM-oriented file system,' the researchers explain in the abstract for their latest paper on the topic. 'A quantitative analysis reveals that, with NVM, 1) the conventional way to manage deduplication metadata for block devices, particularly in light of consistency, is inefficient, and, 2) the performance with deduplication becomes more subject to fingerprint calculations.'
The solution: NV-Dedup, a new deduplication algorithm specifically targeting byte-addressable NVM devices. 'NV-Dedup manages deduplication metadata in a fine-grained, CPU and NVM-favored way, and preserves the metadata consistency with a lightweight transactional scheme. It also does workload-adaptive fingerprinting based on an analytical model and a transition scheme among fingerprinting methods to reduce calculation penalties.'
The results are, in testing at least, impressive. Using a prototype NV-Dedup implemented on top of the Persistent Memory File System (PMFS), a since-abandoned but still useful Intel project for direct CPU access to DRAM-backed storage which operates in much the same way as NVM, the researchers found that not only did NV-Dedup successfully control the duplication of data but it also enhanced storage performance by up to 2.1 times - though it has yet to be proven that this enhancement will transfer to more common and actively-developed filesystems running on real NVM hardware.
The paper, NV-Dedup: High-Performance Inline Deduplication for Non-Volatile Memory, has been published in IEEE Transactions on Computers 2018 vol. 67, and is available for purchase on the official website.