Building IO-Efficient Systems Infrastructure

The amount of digital information being generated doubles every two years. As a result, storage systems have to handle an ever increasing amount of data each day. For example, Pinterest handles tens of thousands of terabytes each day. Data is often stored on solid state drives (SSDs) for good performance. Unfortunately, SSDs have a limited lifetime and will fail after a certain number of writes. Future storage technologies such as Intel 3D XPoint also have limited lifetime. Unfortunately, current storage systems have high input/output (IO) amplification: 100 gigabytes written to the widely-used RocksDB key-value store will result in 2.7 terabytes (28x) written to storage. Solving the problem of storage failure due to IO amplification is crucial to building storage systems of the future that can handle the exponentially increasing amount of data.

The goal of this project is to build IO-Efficient systems infrastructure that drastically reduces the amount of write IO to storage without compromising on performance. To achieve this goal, this project contributes new data structures, techniques, and designs of storage systems. Taking an end-to-end approach, this project redesigns each layer of the storage stack, including key-value stores, transactional stores, and file systems, while tuning the underlying storage device to reduce IO amplification.

This project will significantly impact academia by advancing the state-of-the-art, industry through technology transfer, students via courses and projects, and the general public by providing better open-source storage systems. Undergraduate and graduate students will be trained in the careful analysis and building of advanced storage infrastructure. This project will encourage the participation of women and other under-represented minorities in research.

All software artifacts related to this project will be maintained at this page for the lifetime of this project. All storage systems developed as part of this project will be open-sourced, with sufficient documentation to enable ease-of-use. Other tools, benchmarks, and tools developed during the course of this project will also be made publicly available in text form for easy processing.

This project is funded by NSF CAREER Award #1751277

Related Publications

PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, Ittai Abraham
PDF   Bibtex   Slides   Code on Github