Invited Talk: UCCL: An Extensible Software Transport Layer for GPU Networking
Speaker: Yang Zhou

Time: 4:00-5:00 pm, Nov 18, 2025
Location: 5618 Morgridge Hall and Online (Join via Zoom)

Abstract:

Fast evolving machine learning workloads have increasing requirements for networking, ranging from simple AllReduce to more challenging AlltoAll. However, host networking hardware like RDMA NICs is evolving slowly, e.g., broken DCQCN congestion control that is challenging to tune, and no multipathing support yet. This mismatch has severely hindered performance scale-up and incurred additional costs.

We present UCCL, an extensible software transport layer for GPU networking. UCCL separates the data path and control path of existing RDMA NICs, and achieves efficient software control for the transport running on host CPUs. The extensibility of UCCL transport allows us to implement a multipath transport with packet spraying to resolve ECMP collisions, a receiver-driven transport to handle network incast, and a selective retransmission scheme to handle packet loss. UCCL provides a drop-in replacement for NCCL (and RCCL), and outperforms NCCL by 2.3-3.3x for GPU collectives over RoCE and AWS EFA RDMA.

Beyond collective communication, UCCL also extends to P2P communication (for PD disaggregation) and GPU-initiated communication (for expert parallelism, e.g., in DeepSeekV3) with superior performance. UCCL is fully open-sourced at https://github.com/uccl-project/uccl.

Bio:

Yang Zhou is an Assistant Professor at UC Davis. He was a postdoc in UC Berkeley SkyLab, working with Ion Stoica. He got his Ph.D. from Harvard, advised by Minlan Yu and James Mickens. He has equal interests in core systems and ML systems research, including efficient LLMs, GPU communication, and heterogeneous computing. He is currently working on the UCCL project to build efficient GPU communication systems.