Invited Talk: Single Level Stores: Providing Checkpointing as an OS Service (Emil Tsalapatis)

Time: 4-5 pm, Nov 19th, 2024
Location: CS 4310 (and online on Zoom)

Abstract: This talk presents the Aurora single level store, an OS design that uses continuous checkpointing for application persistence and deployment. Aurora provides submillisecond application checkpoint and restore operations to efficiently turn applications into on-disk images and back. Fast checkpointing/restore as an OS service also serves as a foundation for further research into open problems like efficient persistence APIs for memory-mapped data and serverless computing.

Aurora's single level store-based persistence has recently become practical because of advances in hardware and file system technology. Modern SSD storage devices have low latency at 10μs, allowing us to persist application checkpoints to the disk with minimal latency overhead. Modern CPUs also have IO throughput that rivals that of their memory bandwidth, making it possible to continuously checkpoint and forward in-memory application state to the disk.

This talk describes three systems that demonstrate the efficiency and flexibility of the single level store paradigm. I will first be presenting Aurora (SOSP 2021), an OS design capable of continuous application checkpointing at a fast enough granularity to provide transparent persistence. I will be following up with MemSnap (ASPLOS 2024), an OS single level store API and associated virtual memory mechanism. MemSnap persists application data, e.g., database data, more efficiently than the file API. Finally, I will be presenting Metropolis, a serverless invoker that uses the single level store paradigm to create serverless function instances at sub-millisecond latency.