Pengfei Zuo
Email: pfzuo.cs@gmail.com
I currently work on AI infrastructure at ByteDance Seed.
Prior to joining Seed, I was a technical leader and expert in AI & cloud infrastructure at Huawei Cloud, where I worked on large-scale software and hardware systems spanning LLM serving, AI infrastructure, and memory services — including a series of innovative techniques for LLM serving, the CloudMatrix384 AI supernode system, and the Elastic Memory Service (EMS), a disaggregated memory infrastructure that serves as the caching foundation for LLM training and inference.
I received my Ph.D. degree (advised by Prof. Yu Hua) in Computer Science from Huazhong University of Science and Technology (HUST) in 2019. I was a visiting Ph.D. student (advised by Prof. Yuan Xie) at the University of California, Santa Barbara (UCSB) during 2018-2019. I received a B.E. degree in Computer Science from HUST in 2014.
My research focuses on AI and cloud infrastructure, with an emphasis on machine learning systems, memory systems, and distributed systems. I have published 50+ refereed papers in top-tier computer systems and architecture conferences and journals, including SOSP, OSDI, MICRO, ASPLOS, FAST, USENIX ATC, VLDB, and DAC. I received the 2020 ACM China Doctoral Dissertation Award (only two awardees among all computer disciplines across China every year) and the Best Paper Award in FAST 2023. The open-source codes for our research on AI systems and disaggregated data centers are available at ASISys and DMemSys, respectively.
I am seeking motivated interns in AI systems. If you’re passionate about tackling key industry challenges and publishing impactful research, feel free to reach out!
Research
AI Systems and Algorithms
- LLM Serving Systems: CachedAttention (USENIX ATC’24), Adrenaline (ArXiv’25), TaiJi (SC’26), SparseServe (ICS’26), DualMap (ICLR’26), TileSparse (ICML’26), SPEX (OSDI’26)
- LLM Agent Systems: Maze (SC’26)
- Generative Recommendation: RelayGR (Technical Report’26)
- AI Algorithms: AdaSkip (AAAI’25), Progressive Sparse Attention (ArXiv’25), HyPER (ICML’26)
- AI Hardware Architectures: DeepSniffer (ASPLOS’20), SEAL (DAC’21), Memory Trojaning (TCAD’21), CloudMatrix384 (Technical Report’25)
Memory Systems and Architectures
- Disaggregated Memory Systems: FORD (FAST’22), uKaron (USENIX ATC’22), FUSEE (FAST’23), Ditto (SOSP’23), Aceso (SOSP’24)
- Disaggregated Memory Indexes: RACE (USENIX ATC’21), ROLEX (FAST’23, Best Paper), SMART (OSDI’23), CHIME (SOSP’24)
- Persistent Memory Systems: Path Hashing (MSST’17), Level Hashing (OSDI’18) , CLevel (USENIX ATC’20)
- Persistent Memory Architectures: DPEC (DATE’18), DeWrite (MICRO’18), SecPM (HotStorage’18), SuperMem (MICRO’19), SimCom (DAC’20)
Storage Systems
- Indexes: MinCounter (MSST’15), SmartCuckoo (USENIX ATC’17), DLSH (SoCC’17), FINEDex (VLDB’22)
- Deduplication Systems: SMR (MSST’17), BEES (ICDCS’17), RRCS (IPDPS’18)
News
| Jul 02, 2026 | Our papers “TaiJi: Unifying Prefill-Decode Disaggregation and Aggregation for Goodput-Optimized LLM Serving” and “Maze: A Distributed Framework for Large Language Model Agents” were accepted by SC 2026. Congratulations to Chao and Jing! |
|---|---|
| Mar 27, 2026 | Our papers “TileSparse: Arithmetic-Intensity-Aware Sparse Attention for Compute-Bound LLM Decoding” and “HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction” were accepted by ICML 2026. Congratulations to Chao and Shengxuan! |
| Mar 26, 2026 | Our paper “Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration” was accepted by OSDI 2026. Congratulations to Shuzhang! |
| Feb 23, 2026 | Our paper “SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving” was accepted by ICS 2026. Congratulations to Qihui! |
| Jan 26, 2026 | Our paper “DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving” was accepted by ICLR 2026. Congratulations to Ying! |
Selected Publications
- SCMaze: A Distributed Framework for Large Language Model AgentsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2026
- ICMLTileSparse: Arithmetic-Intensity-Aware Sparse Attention for Compute-Bound LLM DecodingProceedings of the 43rd International Conference on Machine Learning (ICML), 2026
- FASTROLEX: A Scalable RDMA‑oriented Learned Key‑Value Store for Disaggregated Memory SystemsIn Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST), 2023