DEDUPLICATION-AWARE PAGE CACHE IN LINUX KERNEL FOR IMPROVED READ PERFORMANCE
View/ Open
Date
2019-12-16Author
Boggavarapu, Venkata Satya Ravi Kiran
Metadata
Show full item recordAbstract
The amount of data being produced and consumed is increasing every day. As a result, there can be a large amount of redundant data in the storage system. Storing and accessing these duplicate data unnecessarily consumes disk space and I/O bandwidth. Deduplication techniques are widely deployed to remove the redundancy. In particular, the deduplication solutions that work at the block level are proven to be effective. These solutions aim to effectively use disk space and write bandwidth by avoiding duplicate data writes to the storage. However, such a design might not help in improving the read performance, which is critical for many modern-day applications.
The Linux kernel implements an in-memory cache of pages, called the page cache, to improve I/O performance by minimizing disk accesses. The page cache has pages originating from regular file systems, and it is indexed by a file and the offset within the file. However, due to such a design, deduplication information is currently not available to the page cache. Due to this, the kernel cannot avoid read requests from going to the disk on offsets that are not present in the page cache, even though the requested data duplicates another offset that is already cached. Consequently, the overall I/O performance of the applications running on these systems can be compromised.
To address this issue, we propose a lightweight scheme called Dual-Dedup, that efficiently coordinates the deduplication information with the page cache. It discloses the redundancy knowledge detected by the block-level deduplication layer to the page cache, which can then prevent unnecessary read requests. Results from extensive experiments show that Dual-Dedup significantly improves read performance. On FIO tests with 25% duplicate data, our system shows an improvement of 34% in the read throughput when compared with Linux EXT4.