Pengfei Zuo

I currently work on AI infrastructure at ByteDance Seed.

Prior to joining Seed, I was a technical leader and expert in AI & cloud infrastructure at Huawei Cloud, where I worked on strategy planning, technical innovation, and engineering implementation for large-scale AI software and hardware systems, with a focus on LLM serving, AI supernode systems, and memory and storage services. My work included a series of innovative techniques for LLM serving, the CloudMatrix384 AI supernode system, and the Elastic Memory Service (EMS), a disaggregated memory service that serves as the caching foundation for LLM training and inference.

I received my Ph.D. degree (advised by Prof. Yu Hua) in Computer Science from Huazhong University of Science and Technology (HUST) in 2019. I was a visiting Ph.D. student (advised by Prof. Yuan Xie) at the University of California, Santa Barbara (UCSB) during 2018-2019. I received a B.E. degree in Computer Science from HUST in 2014.

My research focuses on AI and cloud infrastructure, with an emphasis on machine learning systems, memory systems, and distributed systems. I have published 50+ refereed papers in top-tier computer systems and architecture conferences and journals, including SOSP, OSDI, MICRO, ASPLOS, FAST, USENIX ATC, VLDB, and DAC. I received the 2020 ACM China Doctoral Dissertation Award (only two awardees among all computer disciplines across China every year) and the Best Paper Award in FAST 2023. The open-source codes for our research on AI systems and disaggregated data centers are available at ASISys and DMemSys, respectively.

I am seeking motivated interns in AI systems. If you’re passionate about tackling key industry challenges and publishing impactful research, feel free to reach out!

Research

News

Jul 02, 2026	Our papers “TaiJi: Unifying Prefill-Decode Disaggregation and Aggregation for Goodput-Optimized LLM Serving” and “Maze: A Distributed Framework for Large Language Model Agents” were accepted by SC 2026. Congratulations to Chao and Jing!
Mar 27, 2026	Our papers “TileSparse: Arithmetic-Intensity-Aware Sparse Attention for Compute-Bound LLM Decoding” and “HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction” were accepted by ICML 2026. Congratulations to Chao and Shengxuan!
Mar 26, 2026	Our paper “Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration” was accepted by OSDI 2026. Congratulations to Shuzhang!
Feb 23, 2026	Our paper “SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving” was accepted by ICS 2026. Congratulations to Qihui!
Jan 26, 2026	Our paper “DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving” was accepted by ICLR 2026. Congratulations to Ying!

Selected Publications

TaiJi: Unifying Prefill-Decode Disaggregation and Aggregation for Goodput-Optimized LLM Serving

Chao Wang, Pengfei Zuo^*, Zhangyu Chen, Yunkai Liang, Zhou Yu, and Ming-Chang Yang

Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2026

Bib PDF

@article{wang2026taiji,
  title = {TaiJi: Unifying Prefill-Decode Disaggregation and Aggregation for Goodput-Optimized LLM Serving},
  author = {Wang, Chao and Zuo, Pengfei and Chen, Zhangyu and Liang, Yunkai and Yu, Zhou and Yang, Ming-Chang},
  journal = {Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC)},
  year = {2026},
}

Maze: A Distributed Framework for Large Language Model Agents

Jing Gu, Zhuang Xing, Yiheng Yang, Bowen Lv, Jiale Wang, Shuo Yuan, Zijin Chen, Jin Zhao, Pengfei Zuo, and 4 more authors

Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2026

Bib

@article{gu2026maze,
  title = {Maze: A Distributed Framework for Large Language Model Agents},
  author = {Gu, Jing and Xing, Zhuang and Yang, Yiheng and Lv, Bowen and Wang, Jiale and Yuan, Shuo and Chen, Zijin and Zhao, Jin and Zuo, Pengfei and Zheng, Long and Liao, Xiaofei and Jin, Hai and Li, Qinbin},
  journal = {Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC)},
  year = {2026},
}

ICML

TileSparse: Arithmetic-Intensity-Aware Sparse Attention for Compute-Bound LLM Decoding

Chao Wang, Pengfei Zuo^*, Zhangyu Chen, Qihui Zhou, Tsung-Yi Ho, and Ming-Chang Yang

Proceedings of the 43rd International Conference on Machine Learning (ICML), 2026

Bib

@article{wang2026tilesparse,
  title = {TileSparse: Arithmetic-Intensity-Aware Sparse Attention for Compute-Bound LLM Decoding},
  author = {Wang, Chao and Zuo, Pengfei and Chen, Zhangyu and Zhou, Qihui and Ho, Tsung-Yi and Yang, Ming-Chang},
  journal = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  year = {2026},
}

ICML

HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction

Shengxuan Qiu, Haochen Huang, Shuzhang Zhong, Pengfei Zuo, and Meng Li^*

Proceedings of the 43rd International Conference on Machine Learning (ICML), 2026

Research

AI Systems and Algorithms

Memory Systems and Architectures

Storage Systems

News

Selected Publications