Invited Talk: Demystifying DeepSeek V3 and a Taste of R1 (Minghao Yan)

Time: 4:30-5:30 pm, Feb 6th, 2025
Location: CS 1325

Abstract: DeepSeek has taken over the world's attention with their state-of-the-art open source model drop reasoning model R1. In this talk, I am going to talk about the innovations behind R1 and the secret hero behind the scene, the DeepSeek V3 model, from both model architecture to system design perspectives. I will present the techniques they develop that enable them to work under hardware constraints and train a frontier model with inferior hardware.