madSystems | MadSystems Seminar

Date: May 1st, 2025

Abstract: We will discuss Quake, an indexing system for vector search to appear in OSDI'25. Quake is a partitioned index that dynamically adapts the index to the workload. Quake is guided by a cost model that predicts query latency based on index partition sizes and access frequency. Furthermore, it employs a recall estimation model to automatically meet query recall targets without tuning.

Bio: Jason is a graduating Ph.D. student here at UW advised by Shivaram Venkatarman and Theodoros Rekatsinas. He primarily works on retrieval systems.

Date: Apr 24th, 2025

Abstract: In this talk, we will look at KernMLOps. KernMLOps is an MLOps framework designed for enabling machine learning inside the kernel. We will look at how KernMLOps supports three critical phases of machine learning - data collection, model deployment, and model updates. The talk will also feature a live demo (fingers crossed).

Bio:Saurabh is a post-doc at UT-Austin, working with Aditya Akella and Chris Rossbach. He usually works on building infrastructure for ML, ranging from distributed training and inference. Currently, he is interested in how to enable ML for systems and augmented LLM serving systems.

Date: Apr 10th, 2025

Abstract: Many cloud applications use both serverless functions, for bursts of stateless parallel computation, and container orchestration, for long-running microservices and tasks that need to interact. Ideally a single platform would offer the union of these systems’ capabilities, but neither is sufficient to act as that single platform: serverless functions are lightweight but cannot act as servers with long-term state, while container orchestration offers general-purpose computation but instance start-up takes too long to support burst parallelism.

σOS is a new multi-tenant cloud operating system that combines the best of container orchestration and serverless in one platform with one API. σOS computations, called procs, can be long-running, stateful, and interact with each other, making them a good match for both serverless and microservice tasks. A key aspect of the σOS design is its cloud-centric API, which provides flexible management of computation, a novel abstraction for communication endpoints, σEPs— which allow procs of a tenant to communicate efficiently but prohibits procs from sending packets to other tenants—and a flexible naming system to name, for example, σEPs. Quick proc start-up is important for serverless uses. A key enabling observation is that both serverless and microservice applications rely on cloud services for much of the work traditionally done by the local OS (e.g., access to durable storage and additional compute resources). σOS exploits this observation by providing only a small and generic local operating system image to each proc, which can be created much more quickly than a container orchestration instance since σOS need not install application-specific filesystem content or (due to σOS’s σEPs) configure an isolated overlay network.

Microbenchmarks show that σOS can cold start a proc in 7.7 msec and can create 36,650 procs per second, distributing them over a 24-machine cluster. An evaluation of σOS with two microservice applications from DeathStarBench, a MapReduce application, and an image processing benchmark, shows that the σOS API supports both microservices and lambda-style computations, and provides better performance than corresponding versions on AWS Lambda and Kubernetes.

Date: Apr 3rd, 2025

Abstract: Data deduplication is used to conserve storage space and network bandwidth. Content-defined chunking (CDC) algorithms divide data into chunks, dictating the space-saving efficiency of deduplication systems. However, modern CDC algorithms are slow due to their compute-intensive nature and need to scan large amounts of data, becoming one of the main bottlenecks in the deduplication pipeline. In this talk, I will present two solutions to accelerate content-defined chunking. The first solution, VectorCDC, uses AVX-friendly techniques to redesign and accelerate existing chunking algorithms. The second solution, SeqCDC, presents a new vector-friendly algorithm that uses content-defined heuristics to selectively skip scanning data regions, improving throughput without significantly affecting space savings.

Bio: Sreeharsha is a 4th year PhD student at the University of Waterloo advised by Prof. Samer Al-Kiswany. His research focuses on incorporating hardware acceleration into large-scale distributed systems.

Date: Mar 20th, 2025

Abstract: CXL introduces new opportunities for memory expansion, cross-host memory pooling and sharing. However, its practical adoption is shaped by hardware constraints that directly impact system software and applications. This talk explores key hardware constraints in CXL pod design highlighting scaling limits in existing “fully connected” approaches based on switches and multi-ported devices. We introduce “loosely connected” CXL pods, which open a new design space including a range of new tradeoffs. We present preliminary evaluation results that highlight the benefits of our new pod design for some applications, and the challenges for others.

Date: Mar 13th, 2025

Abstract: Autotuning plays a pivotal role in optimizing the performance of systems, particularly in large-scale cloud deployments. One of the main challenges in performing autotuning in the cloud arises from performance variability. We first investigate the extent to which noise slows autotuning and find that as little as 5% noise can lead to a 2.5x slowdown in converging to the best-performing configuration. We measure the magnitude of noise in cloud computing settings and find that while some components (CPU, disk) have almost no performance variability, there are still sources of significant variability (caches, memory). Furthermore, variability leads to autotuning finding unstable configurations. As many as 63.3% of the configurations selected as "best" during tuning can have their performance degrade by 30% or more when deployed. Using this as motivation, we propose a novel approach to improve the efficiency of autotuning systems by (a) detecting and removing outlier configurations and (b) using ML-based approaches to provide a more stable true signal of de-noised experiment results to the optimizer. The resulting system, TUNA (Tuning Unstable and Noisy Cloud Applications) enables faster convergence and robust configurations. Tuning postgres running mssales, an enterprise production workload, we find that TUNA can lead to 1.88x lower running time on average with 2.58x lower standard deviation compared to traditional sampling methodologies.

Date: Mar 6th, 2025

Abstract: DeepSeek recently open-sourced its distributed filesystem, which fully utilizes the bandwidth of modern SSDs and RDMA networks. This session will discuss 3FS design choices, including interface, metadata management, replication, data placement, and failure handling.

Reading:

Date: Feb 27th, 2025

Abstract: Resource as a Service (RaaS) is a cloud computing model that enables fine-grained, pay-as-you-go access to computational resources, with provider-controlled economic mechanisms shaping resource availability and pricing. In this talk we will discuss RaaS, the challenges and opportunities presented by the model, and example price-aware systems.

Date: Feb 20th, 2025

Abstract: Blockchain is the technology behind cryptocurrencies like Bitcoin and Ethereum. It enables building a distributed storage system in a trustless setting, which can then be used to create various web3 applications. In this talk, we will provide an overview of how blockchains work and what are some ongoing research directions, with emphasis on system-related topics.

Date: Feb 6th, 2025

Abstract: This talk presents the Aurora single level store, an OS design that uses continuous checkpointing for application persistence and deployment. Aurora provides submillisecond application checkpoint and restore operations to efficiently turn applications into on-disk images and back. Fast checkpointing/restore as an OS service also serves as a foundation for further research into open problems like efficient persistence APIs for memory-mapped data and serverless computing.

Date: Nov 19th, 2024

Abstract: This talk presents the Aurora single level store, an OS design that uses continuous checkpointing for application persistence and deployment. Aurora provides submillisecond application checkpoint and restore operations to efficiently turn applications into on-disk images and back. Fast checkpointing/restore as an OS service also serves as a foundation for further research into open problems like efficient persistence APIs for memory-mapped data and serverless computing.

Aurora's single level store-based persistence has recently become practical because of advances in hardware and file system technology. Modern SSD storage devices have low latency at 10μs, allowing us to persist application checkpoints to the disk with minimal latency overhead. Modern CPUs also have IO throughput that rivals that of their memory bandwidth, making it possible to continuously checkpoint and forward in-memory application state to the disk.

This talk describes three systems that demonstrate the efficiency and flexibility of the single level store paradigm. I will first be presenting Aurora (SOSP 2021), an OS design capable of continuous application checkpointing at a fast enough granularity to provide transparent persistence. I will be following up with MemSnap (ASPLOS 2024), an OS single level store API and associated virtual memory mechanism. MemSnap persists application data, e.g., database data, more efficiently than the file API. Finally, I will be presenting Metropolis, a serverless invoker that uses the single level store paradigm to create serverless function instances at sub-millisecond latency.

Date: Nov 5th, 2024

Abstract: Many users today have tens to hundreds of accounts with web services that store sensitive data, from social media to tax preparation and e-commerce sites. While users have the right to delete their data (via e.g., the GDPR or CCPA), users want and deserve more nuanced controls over their data that don't exist today. For example, a user might wish to hide and protect data of an e-commerce or dating app profile when inactive, but also want their data to be present should they return to use the application. Today, however, services often provide only coarse-grained, blunt tools that result in all-or-nothing exposure of users' private information.

This thesis introduces the notion of disguised data, a reversible state of data in which sensitive data is selectively hidden. To demonstrate the feasibility of disguised data, this thesis also presents Edna—the first system for disguised data—which helps database-backed web applications allow users to remove their data without permanently losing their accounts, anonymize their old data, and selectively dissociate personal data from public profiles. Edna helps developers support these features while maintaining application functionality and referential integrity in the database via disguising and revealing transformations. Disguising selectively renders user data inaccessible via encryption, and revealing enables the user to restore their data to the application. Edna's techniques allow transformations to compose in any order, e.g., deleting a previously anonymized user's account, or restoring an account back to an anonymized state.

With Edna, web applications can enable flexible privacy features with reasonable developer effort and moderate performance impact on application operation throughput. In the Lobsters social media application—a 160k LoC web application with >16k users—adding Edna and its features takes less than 1k LoC, and decreases throughput 1-7% in the common case and up to 28% in the worst case (when the user owning 1% of all application data continuously disguises and reveals their account).

Bio: Lily Tsai currently works as part of SystemsResearch@Google (SRG), researching systematic ways to achieve better data privacy and security in data warehouses, ML frameworks, and more. She just graduated with her PhD at MIT in the PDOS group in 2024, where her research with Malte Schwarzkopf and Frans Kaashoek aimed to design systems for better data protections and security in web applications. Besides research, Lily loves to play violin, read sci-fi, hike, climb, and explore the world around her.

Date: Oct 29th, 2024

Sujay Yadalam will lead the discussion on cloud platforms and workloads. The discussion will focus on this recent paper from Microsoft: Workload Intelligence: Punching Holes Through the Cloud Abstraction

He will also briefly talk about resource harvesting:

Date: Oct 22nd, 2024

Abstract: Cluster managers, such as Kubernetes, are among the most critical systems in modern clouds. Modern cluster managers are architected as a fleet of controllers that manage all the applications, services, and resources in the cloud, thus making controller correctness paramount. We have developed several testing techniques (including Acto for functional testing and Sieve for fault-tolerance testing) for cluster management controllers in the past few years. Besides many serious controller bugs we discovered through testing, we learned valuable lessons that controller bugs have diverse causes such as concurrency, asynchrony, and failures, and often lead to liveness violations. Formal verification is promising for avoiding bugs in system software, but most work so far focused on safety, instead of liveness.

In this talk, I will present Anvil, our effort for building provably correct cluster managers. Anvil is a framework for developing practical controller implementations and verifying that the controllers correctly implement liveness and safety properties. One key challenge is to develop a formal specification that precludes a broad range of controller bugs and is generally applicable to diverse controllers. I will present how we address this challenge with Eventually Stable Reconciliation, a general and powerful specification written as a concise temporal logic liveness property. I will also present how to use Anvil to verify three Kubernetes controllers for managing ZooKeeper, RabbitMQ, and FluentBit, which can readily be deployed in Kubernetes platforms and achieve feature parity and the same performance compared to widely used unverified controllers. The proof effort is reasonable compared to previous work due to Anvil's rich features for assisting temporal reasoning. Lastly, I will talk about our ongoing work on extending Anvil to verify cross-controller interactions compositionally and reducing the trusted computing base by verifying Kubernetes core controllers using Anvil.

Bio: Xudong Sun is a final year CS PhD student at the University of Illinois Urbana-Champaign, advised by Prof. Tianyin Xu. Xudong's research focuses on improving system reliability using formal verification, systematic testing, fault injection, and model checking. Xudong's research was recognized with the Jay Lepreau Best Paper Award at OSDI'2024, Mavis Future Faculty Fellowship, and Yunni and Maxine Pao Memorial Fellowship. Xudong is on the job market looking for research positions in academia and industry in the 2024 - 2025 cycle.

Date: Oct 15th, 2024

Prof. Tej Chajed will give a discussion/tutorial on systems verification. Reading list:

Date: Oct 8th, 2024

Abstract: Serverless computing enables scalable, cost-efficient, and simple to program applications by allowing users to develop their workloads as compositions of lightweight functions, while the cloud provider manages the underlying infrastructure. However, serverless workloads differ significantly from traditional cloud applications—they are short-lived, have smaller data and instruction footprints, experience bursty request arrival patterns, involve high I/O activity, and undergo frequent context switches. These characteristics result in performance, energy, and resource inefficiencies when run on conventional server-class processors.

To address these challenges, I introduce μManycore (ISCA '23), a processor architecture optimized for serverless environments. Unlike traditional processors, μManycore focuses on minimizing tail latency, the key performance metric in cloud environments. It achieves this goal by removing the main contention points in the system. First, instead of having a chip-wide cache coherence, μManycore organizes its cores into small cache-coherent villages. Then, it connects the villages via a leaf-spine network topology that provides low-latency, redundant paths between clusters of villages. Finally, μManycore is augmented with a hardware support for request scheduling and context switching.

Building on μManycore, I propose Mosaic (MICRO '24), a core micro-architecture optimized for serverless workloads. Mosaic slices micro-architectural structures, such as caches and branch predictors, into small chunks and assigns tiles of such chunks to functions. The processor retains the state of functions in their tiles across context switches, thereby improving performance. Furthermore, currently inactive tiles are set to a low power mode, thereby reducing energy consumption. Together, μManycore and Mosaic deliver significant improvements in tail latency, throughput, and power efficiency for serverless environments.

Bio: Jovan Stojkovic is a final year PhD student at the University of Illinois at Urbana-Champaign advised by Professor Josep Torrellas. His research interests are in hardware and software abstractions for cloud platforms and emerging computing paradigms, such as microservices and serverless computing. His work appeared in major computer architecture conferences, such as ISCA, MICRO, ASPLOS, and HPCA. He was awarded multiple distinctions, such as Mavis Future Faculty Fellowship, Young Researcher at the Heidelberg Laureate Forum, Kenichi Miura Award for excellence in high performance computing, and IEEE Top Picks Honorable Mention. He is currently at the academic job market.

Date: Oct 1st, 2024

Anjali will lead the topic discussion on serverless computing. Reading list:

Serverless in the Wild [ATC '20]
Benchmarking, analysis, and optimization of serverless function snapshots [ASPLOS '21]
Faster and Cheaper Serverless Computing on Harvested Resources [SOSP '21]
(if time permits) Architectural Implications of Function-as-a-Service Computing [MICRO '19]

Date: Sept 24th, 2024

Abstract: Cloud providers seek to deploy CXL-based memory to increase aggregate memory capacity, reduce costs, and lower carbon emissions. However, CXL accesses incur higher latency than local DRAM. Existing systems use software to manage data placement across memory tiers at page granularity. Cloud providers are reluctant to deploy software-based tiering due to high overheads in virtualized environments. Hardware-based memory tiering could place data at cacheline granularity, mitigating these drawbacks. However, hardware is oblivious to application-level performance.

We propose combining hardware-managed tiering with software-managed performance isolation to overcome the pitfalls of either approach. We introduce Intel Flat Memory Mode, the first hardware-managed tiering system for CXL. Our evaluation on a full-system prototype demonstrates that it provides performance close to regular DRAM, with no more than 5% degradation for more than 82% of workloads. Despite such small slowdowns, we identify two challenges that can still degrade performance by up to 34% for “outlier” workloads: (1) memory contention across tenants, and (2) intra-tenant contention due to conflicting access patterns.

To address these challenges, we introduce Memstrata, a lightweight multi-tenant memory allocator. Memstrata employs page coloring to eliminate inter-VM contention. It improves performance for VMs with access patterns that are sensitive to hardware tiering by allocating them more local DRAM using an online slowdown estimator. In multi-VM experiments on prototype hardware, Memstrata is able to identify performance outliers and reduce their degradation from above 30% to below 6%, providing consistent performance across a wide range of workloads.

Bio: Yuhong Zhong is a third-year PhD student in Computer Science at Columbia University, advised by Asaf Cidon. He is broadly interested in computer systems, especially CXL, memory tiering, storage systems, and eBPF. His research was recognized with the Best Paper Award at OSDI '22. Before starting his PhD, Yuhong was a software engineer at VMware in the vSAN group.

MadSystems Seminar

Communication 📣

Schedule 📅

Past Events

MadSystems Seminar

Communication 📣

Schedule 📅

Past Events

[May 1st, 2025] Quake: Adaptive Indexing for Vector Search (Jason Mohoney)

[Apr 24th, 2025] Enabling ML in the OS (Saurabh Agarwal)

[Apr 10th, 2025] Unifying serverless and microservice tasks with SigmaOS (Ariel Szekely)

[Apr 3rd, 2025] Accelerating Content-Defined Chunking for Data Deduplication (Sreeharsha Udayashankar)

[Mar 20th, 2025] Hardware Constraints for Low-Cost CXL Memory Pools (Daniel S. Berger)

[Mar 13th, 2025] TUNA: Tuning Unstable and Noisy Cloud Applications (Johannes Freischuetz)

[Mar 6th, 2025] DeepSeek Fire-Flyer File System (3FS) (Chenhao Ye)

[Feb 27th, 2025] Resource as a Service (RaaS) and Cost-Aware Computing (Hayden Coffey)

[Feb 20th, 2025] Blockchain Technologies: An Overview (Suyan Qu)

[Feb 6th, 2025] Demystifying DeepSeek V3 and a Taste of R1 (Minghao Yan)

[Nov 19th, 2024] Single Level Stores: Providing Checkpointing as an OS Service (Emil Tsalapatis)

[Nov 5th, 2024] Flexible Privacy via Disguising and Revealing (Lily Tsai)

[Oct 29th, 2024] Cloud Platforms and Workloads

[Oct 22nd, 2024] Anvil: Verifying Liveness of Cluster Management Controllers (Xudong Sun)

[Oct 15th, 2024] Verification

[Oct 8, 2024] Server Architecture in the Age of Serverless Computing (Jovan Stojkovic)

[Oct 1st, 2024] Serverless Computing

[Sep 24th, 2024] Managing Memory Tiers with CXL in Virtualized Environments (Yuhong Zhong)