Pengfei Zuo

Email: pfzuo.cs@gmail.com, pengfei.zuo@huawei.com

zuo-2021.jpg

I currently serve as the chief architect of AI-native storage at Huawei Cloud. I lead the EMS (Elastic Memory Service) team to build a disaggregated memory service layer in the cloud, upgrading Huawei Cloud’s two-tier infrastructure architecture that disaggregates storage and computing into a three-tier architecture that disaggregates computing, memory, and storage. This transformation enables higher resource elasticity and utilization. EMS also serves as a foundational memory infrastructure to enhance the efficiency of LLM training and inference on Huawei Cloud.

I received my Ph.D. degree (advised by Prof. Yu Hua) in Computer Science from Huazhong University of Science and Technology (HUST) in 2019. I was a visiting Ph.D. student (advised by Prof. Yuan Xie) at the University of California, Santa Barbara (UCSB) during 2018-2019. I received a B.E. degree in Computer Science from HUST in 2014.

My research focuses on AI and cloud infrastructure, with an emphasis on machine learning systems and memory systems. I have published 50+ refereed papers in top-tier computer systems and architecture conferences and journals, including SOSP, OSDI, MICRO, ASPLOS, FAST, USENIX ATC, VLDB, and DAC. I received the 2020 ACM China Doctoral Dissertation Award (only two awardees among all computer disciplines across China every year) and the Best Paper Award in FAST 2023. The open-source codes for our research on AI systems and disaggregated data centers are available at ASISys and DMemSys, respectively.

I am seeking motivated interns and postdocs in AI systems. If you’re passionate about tackling key industry challenges and publishing impactful research, feel free to reach out!


Research

AI Systems and Algorithms

Memory Systems and Architectures

Storage Systems


News

Jun 16, 2025 Our technical report “Serving Large Language Models on Huawei CloudMatrix384” is now available on ArXiv!
Dec 09, 2024 Our paper “AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference” was accepted by AAAI 2025. Congratulations to Zhuomin and Yizhen!
May 05, 2024 Our two papers “Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value Stores” and “CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated Memory” were accepted by SOSP’2024. Congratulations to Zhisheng and Xuchuan!
May 05, 2024 Our paper “Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention” was accepted by USENIX ATC’2024. Congratulations to Bin Gao!
Jul 15, 2023 Our paper “Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System” was accepted by SOSP’2023. Congratulations to Jiacheng!

Selected Publications

  1. arXiv
    Prefill-Decode Aggregation or Disaggregation? Unifying Both for Goodput-Optimized LLM Serving
    Chao Wang, Pengfei Zuo*, Zhangyu Chen, Yunkai Liang, Zhou Yu, and Ming-Chang Yang
    arXiv preprint arXiv:2508.01989, 2025
  2. Technical Report
    Serving Large Language Models on Huawei CloudMatrix384
    Pengfei Zuo, Huimin Lin, Junbo Deng, Nan Zou, Xingkun Yang, Yingyu Diao, Weifeng Gao, Ke Xu, Zhangyu Chen, and 37 more authors
    arXiv preprint arXiv:2506.12708, 2025
  3. arXiv
    Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation
    Yunkai Liang, Zhangyu Chen, Pengfei Zuo*, Zhi Zhou*, Xu Chen, and Zhou Yu
    arXiv preprint arXiv:2503.20552, 2025
  4. arXiv
    Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving
    Qihui Zhou, Peiqi Yin, Pengfei Zuo*, and James Cheng
    arXiv preprint arXiv:2503.00392, 2025
  5. USENIX ATC
    Cost‑Efficient Large Language Model Serving for Multi‑turn Conversations with CachedAttention
    Bin Gao, Zhuomin He, Puru Sharma, Qingxuan Kang, Djordje Jevdjic, Junbo Deng, Xingkun Yang, Zhou Yu, and Pengfei Zuo*
    In Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC), 2024
  6. SOSP
    Aceso: Achieving Efficient Fault Tolerance in Memory‑Disaggregated Key‑Value Stores
    Zhisheng Hu, Pengfei Zuo*, Yizou Chen, Chao Wang, Junliang Hu, and Ming‑Chang Yang*
    In Proceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP), 2024
  7. SOSP
    CHIME: A Cache‑Efficient and High‑Performance Hybrid Index on Disaggregated Memory
    Xuchuan Luo, Jiacheng Shen, Pengfei Zuo, Xin Wang, Michael R. Lyu, and Yangfan Zhou
    In Proceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP), 2024
  8. SOSP
    Ditto: An Elastic and Adaptive Memory‑Disaggregated Caching System
    Jiacheng Shen, Pengfei Zuo*, Xuchuan Luo, Yuxin Su, Jiazhen Gu, Hao Feng, Yangfan Zhou, and Michael R. Lyu
    In Proceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP), 2023
  9. OSDI
    SMART: A High‑Performance Adaptive Radix Tree for Disaggregated Memory
    Xuchuan Luo, Pengfei Zuo*, Jiacheng Shen, Jiazhen Gu, Xin Wang, Michael R. Lyu, and Yangfan Zhou*
    In Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2023
  10. FAST
    ROLEX: A Scalable RDMA‑oriented Learned Key‑Value Store for Disaggregated Memory Systems
    Pengfei Li, Yu Hua, Pengfei Zuo, Zhangyu Chen, and Jiajie Sheng
    In Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST), 2023
  11. FAST
    FUSEE: A Fully Memory‑Disaggregated Key‑Value Store
    Jiacheng Shen, Pengfei Zuo*, Xuchuan Luo, Tianyi Yang, Yuxin Su, Yangfan Zhou, and Michael R. Lyu
    In Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST), 2023
  12. FAST
    FORD: Fast One‑sided RDMA‑based Distributed Transactions for Disaggregated Persistent Memory
    Ming Zhang, Yu Hua, Pengfei Zuo, and Lurong Liu
    In Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST), 2022
  13. VLDB
    FINEdex: A Fine‑grained Learned Index Scheme for Scalable and Concurrent Memory Systems
    Pengfei Li, Yu Hua, Jingnan Jia, and Pengfei Zuo
    In Proceedings of the 48th International Conference on Very Large Data Bases (VLDB), 2022
  14. DAC
    SEALing Neural Network Models in Encrypted Deep Learning Accelerators
    Pengfei Zuo, Yu Hua, Ling Liang, Xingfeng Xie, Xing Hu, and Yuan Xie
    In Proceedings of the 58th Design Automation Conference (DAC), 2021
  15. USENIX ATC
    One-sided RDMA-Conscious Extendible Hashing for Disaggregated Memory
    Pengfei Zuo, Jiazhao Sun, Liu Yang, Shuangwu Zhang, and Yu Hua
    In Proceedings of the USENIX Annual Technical Conference (USENIX ATC), 2021
  16. USENIX ATC
    Lock-free Concurrent Level Hashing for Persistent Memory
    Zhangyu Chen, Yu Hua, Bo Ding, and Pengfei Zuo
    In Proceedings of the USENIX Annual Technical Conference (USENIX ATC), 2020
  17. ASPLOS
    DeepSniffer: a DNN Model Extraction Framework Based on Learning Architectural Hints
    Xing Hu, Ling Liang, Shuangchen Li, Lei Deng, Pengfei Zuo, Yu Ji, Xinfeng Xie, Yufei Ding, Chang Liu, and 2 more authors
    In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020
  18. MICRO
    SuperMem: Enabling Application-transparent Secure Persistent Memory with Low Overheads
    Pengfei Zuo, Yu Hua, and Yuan Xie
    In Proceedings of the 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019
  19. MICRO
    Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes
    Pengfei Zuo, Yu Hua, Ming Zhao, Wen Zhou, and Yuncheng Guo
    In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018
  20. OSDI
    Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory
    Pengfei Zuo, Yu Hua, and Jie Wu
    In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018