Pengfei Zuo

I currently serve as the chief architect of AI-native storage at Huawei Cloud. I lead the EMS (Elastic Memory Service) team to build a disaggregated memory service layer in the cloud, upgrading Huawei Cloud’s two-tier infrastructure architecture that disaggregates storage and computing into a three-tier architecture that disaggregates computing, memory, and storage. This transformation enables higher resource elasticity and utilization. EMS also serves as a foundational memory infrastructure to enhance the efficiency of LLM training and inference on Huawei Cloud.

I received my Ph.D. degree (advised by Prof. Yu Hua) in Computer Science from Huazhong University of Science and Technology (HUST) in 2019. I was a visiting Ph.D. student (advised by Prof. Yuan Xie) at the University of California, Santa Barbara (UCSB) during 2018-2019. I received a B.E. degree in Computer Science from HUST in 2014.

My research focuses on AI and cloud infrastructure, with an emphasis on machine learning systems and memory systems. I have published 50+ refereed papers in top-tier computer systems and architecture conferences and journals, including SOSP, OSDI, MICRO, ASPLOS, FAST, USENIX ATC, VLDB, and DAC. I received the 2020 ACM China Doctoral Dissertation Award (only two awardees among all computer disciplines across China every year) and the Best Paper Award in FAST 2023. The open-source codes for our research on AI systems and disaggregated data centers are available at ASISys and DMemSys, respectively.

I am seeking motivated interns and postdocs in AI systems. If you’re passionate about tackling key industry challenges and publishing impactful research, feel free to reach out!

Research

News

Jun 16, 2025	Our technical report “Serving Large Language Models on Huawei CloudMatrix384” is now available on ArXiv!
Dec 09, 2024	Our paper “AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference” was accepted by AAAI 2025. Congratulations to Zhuomin and Yizhen!
May 05, 2024	Our two papers “Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value Stores” and “CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated Memory” were accepted by SOSP’2024. Congratulations to Zhisheng and Xuchuan!
May 05, 2024	Our paper “Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention” was accepted by USENIX ATC’2024. Congratulations to Bin Gao!
Jul 15, 2023	Our paper “Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System” was accepted by SOSP’2023. Congratulations to Jiacheng!

Selected Publications

arXiv

SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving

Qihui Zhou, Peiqi Yin, Pengfei Zuo^*, and James Cheng

arXiv preprint arXiv:2509.24626, 2025

Bib PDF Code

@article{zhou2025sparseserveunlockingparallelismdynamic,
  title = {SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving},
  author = {Zhou, Qihui and Yin, Peiqi and Zuo, Pengfei and Cheng, James},
  journal = {arXiv preprint arXiv:2509.24626},
  year = {2025},
}

Technical Report

Serving Large Language Models on Huawei CloudMatrix384

Pengfei Zuo, Huimin Lin, Junbo Deng, Nan Zou, Xingkun Yang, Yingyu Diao, Weifeng Gao, Ke Xu, Zhangyu Chen, and 37 more authors

arXiv preprint arXiv:2506.12708, 2025

Bib PDF

@article{Zuo2025ServingLL,
  title = {Serving Large Language Models on Huawei CloudMatrix384},
  author = {Zuo, Pengfei and Lin, Huimin and Deng, Junbo and Zou, Nan and Xingkun Yang and Yingyu Diao and Weifeng Gao and Ke Xu and Zhangyu Chen and Shirui Lu and Zhao Qiu and Peiyang Li and Xianyu Chang and Zhengzhong Yu and Fangzheng Miao and Jia Zheng and Ying Li and Yuan Feng and Bei Wang and Zaijian Zong and Mosong Zhou and Wenli Zhou and Houjiang Chen and Xingyu Liao and Yipeng Li and Wenxiao Zhang and Ping Zhu and Yinggang Wang and Chuanjie Xiao and Depeng Liang and Dong Cao and Juncheng Liu and Yongqiang Yang and Xiaolong Bai and Yi Li and Huaguo Xie and Huatao Wu and Zhibin Yu and Lv Chen and Hu Liu and Yujun Ding and Haipei Zhu and Jing Xia and Yi Xiong and Zhou Yu and Heng Liao},
  journal = {arXiv preprint arXiv:2506.12708},
  year = {2025},
}

arXiv

Prefill-Decode Aggregation or Disaggregation? Unifying Both for Goodput-Optimized LLM Serving

Chao Wang, Pengfei Zuo^*, Zhangyu Chen, Yunkai Liang, Zhou Yu, and Ming-Chang Yang

arXiv preprint arXiv:2508.01989, 2025

Bib PDF

@article{wang2025prefill,
  title = {Prefill-Decode Aggregation or Disaggregation? Unifying Both for Goodput-Optimized LLM Serving},
  author = {Wang, Chao and Zuo, Pengfei and Chen, Zhangyu and Liang, Yunkai and Yu, Zhou and Yang, Ming-Chang},
  journal = {arXiv preprint arXiv:2508.01989},
  year = {2025},
}

arXiv

Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation

Yunkai Liang^†, Zhangyu Chen^†, Pengfei Zuo^*, Zhi Zhou^*, Xu Chen, and Zhou Yu

arXiv preprint arXiv:2503.20552, 2025

Bib PDF Code

@article{Liang2025InjectingAI,
  title = {Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation},
  author = {Liang, Yunkai and Chen, Zhangyu and Zuo, Pengfei and Zhou, Zhi and Chen, Xu and Yu, Zhou},
  journal = {arXiv preprint arXiv:2503.20552},
  year = {2025},
}

arXiv

Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving

Qihui Zhou, Peiqi Yin, Pengfei Zuo^*, and James Cheng

arXiv preprint arXiv:2503.00392, 2025

Bib PDF Code

@article{Zhou2025ProgressiveSA,
  title = {Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving},
  author = {Zhou, Qihui and Yin, Peiqi and Zuo, Pengfei and Cheng, James},
  journal = {arXiv preprint arXiv:2503.00392},
  year = {2025},
}

USENIX ATC

Cost‑Efficient Large Language Model Serving for Multi‑turn Conversations with CachedAttention

Bin Gao, Zhuomin He, Puru Sharma, Qingxuan Kang, Djordje Jevdjic, Junbo Deng, Xingkun Yang, Zhou Yu, and Pengfei Zuo^*

In Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC), 2024

Bib PDF

@inproceedings{Gao2024CostEfficientLL,
  title = {Cost‑Efficient Large Language Model Serving for Multi‑turn Conversations with CachedAttention},
  author = {Gao, Bin and He, Zhuomin and Sharma, Puru and Kang, Qingxuan and Jevdjic, Djordje and Deng, Junbo and Yang, Xingkun and Yu, Zhou and Zuo, Pengfei},
  booktitle = {Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC)},
  year = {2024},
}

SOSP

Aceso: Achieving Efficient Fault Tolerance in Memory‑Disaggregated Key‑Value Stores

Zhisheng Hu, Pengfei Zuo^*, Yizou Chen, Chao Wang, Junliang Hu, and Ming‑Chang Yang^*

In Proceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP), 2024

Bib PDF Code

@inproceedings{Hu2024AcesoAA,
  title = {Aceso: Achieving Efficient Fault Tolerance in Memory‑Disaggregated Key‑Value Stores},
  author = {Hu, Zhisheng and Zuo, Pengfei and Chen, Yizou and Wang, Chao and Hu, Junliang and Yang, Ming‑Chang},
  booktitle = {Proceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP)},
  year = {2024},
}

SOSP

CHIME: A Cache‑Efficient and High‑Performance Hybrid Index on Disaggregated Memory

Xuchuan Luo, Jiacheng Shen, Pengfei Zuo, Xin Wang, Michael R. Lyu, and Yangfan Zhou

In Proceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP), 2024

Bib PDF Code

@inproceedings{Luo2024CHIMEA,
  title = {CHIME: A Cache‑Efficient and High‑Performance Hybrid Index on Disaggregated Memory},
  author = {Luo, Xuchuan and Shen, Jiacheng and Zuo, Pengfei and Wang, Xin and Lyu, Michael R. and Zhou, Yangfan},
  booktitle = {Proceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP)},
  year = {2024},
}

SOSP

Ditto: An Elastic and Adaptive Memory‑Disaggregated Caching System

Jiacheng Shen, Pengfei Zuo^*, Xuchuan Luo, Yuxin Su, Jiazhen Gu, Hao Feng, Yangfan Zhou, and Michael R. Lyu

In Proceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP), 2023

Bib PDF Code

@inproceedings{Shen2023DittoAE,
  title = {Ditto: An Elastic and Adaptive Memory‑Disaggregated Caching System},
  author = {Shen, Jiacheng and Zuo, Pengfei and Luo, Xuchuan and Su, Yuxin and Gu, Jiazhen and Feng, Hao and Zhou, Yangfan and Lyu, Michael R.},
  booktitle = {Proceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP)},
  year = {2023},
}

OSDI

SMART: A High‑Performance Adaptive Radix Tree for Disaggregated Memory

Xuchuan Luo, Pengfei Zuo^*, Jiacheng Shen, Jiazhen Gu, Xin Wang, Michael R. Lyu, and Yangfan Zhou^*

In Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2023

Bib PDF Code

@inproceedings{Luo2023SMARTA,
  title = {SMART: A High‑Performance Adaptive Radix Tree for Disaggregated Memory},
  author = {Luo, Xuchuan and Zuo, Pengfei and Shen, Jiacheng and Gu, Jiazhen and Wang, Xin and Lyu, Michael R. and Zhou, Yangfan},
  booktitle = {Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI)},
  year = {2023},
}

FAST

ROLEX: A Scalable RDMA‑oriented Learned Key‑Value Store for Disaggregated Memory Systems

Pengfei Li, Yu Hua, Pengfei Zuo, Zhangyu Chen, and Jiajie Sheng

In Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST), 2023

Best Paper Award Bib PDF Code

true

@inproceedings{Li2023ROLEXAS,
  title = {ROLEX: A Scalable RDMA‑oriented Learned Key‑Value Store for Disaggregated Memory Systems},
  author = {Li, Pengfei and Hua, Yu and Zuo, Pengfei and Chen, Zhangyu and Sheng, Jiajie},
  booktitle = {Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST)},
  year = {2023},
}

FAST

FUSEE: A Fully Memory‑Disaggregated Key‑Value Store

Jiacheng Shen, Pengfei Zuo^*, Xuchuan Luo, Tianyi Yang, Yuxin Su, Yangfan Zhou, and Michael R. Lyu

In Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST), 2023

Bib PDF Code

@inproceedings{Shen2023FUSEEAF,
  title = {FUSEE: A Fully Memory‑Disaggregated Key‑Value Store},
  author = {Shen, Jiacheng and Zuo, Pengfei and Luo, Xuchuan and Yang, Tianyi and Su, Yuxin and Zhou, Yangfan and Lyu, Michael R.},
  booktitle = {Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST)},
  year = {2023},
}

FAST

FORD: Fast One‑sided RDMA‑based Distributed Transactions for Disaggregated Persistent Memory

Ming Zhang, Yu Hua, Pengfei Zuo, and Lurong Liu

In Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST), 2022

Bib PDF Code

@inproceedings{Zhang2022FORDFO,
  title = {FORD: Fast One‑sided RDMA‑based Distributed Transactions for Disaggregated Persistent Memory},
  author = {Zhang, Ming and Hua, Yu and Zuo, Pengfei and Liu, Lurong},
  booktitle = {Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST)},
  year = {2022},
}

VLDB

FINEdex: A Fine‑grained Learned Index Scheme for Scalable and Concurrent Memory Systems

Pengfei Li, Yu Hua, Jingnan Jia, and Pengfei Zuo

In Proceedings of the 48th International Conference on Very Large Data Bases (VLDB), 2022

Bib PDF Code

@inproceedings{Li2022FINEdexAF,
  title = {FINEdex: A Fine‑grained Learned Index Scheme for Scalable and Concurrent Memory Systems},
  author = {Li, Pengfei and Hua, Yu and Jia, Jingnan and Zuo, Pengfei},
  booktitle = {Proceedings of the 48th International Conference on Very Large Data Bases (VLDB)},
  year = {2022},
}

DAC

SEALing Neural Network Models in Encrypted Deep Learning Accelerators

Pengfei Zuo, Yu Hua, Ling Liang, Xingfeng Xie, Xing Hu, and Yuan Xie

In Proceedings of the 58th Design Automation Conference (DAC), 2021

Bib PDF

@inproceedings{Zuo2021SEALingNN,
  title = {SEALing Neural Network Models in Encrypted Deep Learning Accelerators},
  author = {Zuo, Pengfei and Hua, Yu and Liang, Ling and Xie, Xingfeng and Hu, Xing and Xie, Yuan},
  booktitle = {Proceedings of the 58th Design Automation Conference (DAC)},
  year = {2021},
}

USENIX ATC

One-sided RDMA-Conscious Extendible Hashing for Disaggregated Memory

Pengfei Zuo, Jiazhao Sun, Liu Yang, Shuangwu Zhang, and Yu Hua

In Proceedings of the USENIX Annual Technical Conference (USENIX ATC), 2021

Bib PDF

@inproceedings{Zuo2021OnesidedRE,
  title = {One-sided RDMA-Conscious Extendible Hashing for Disaggregated Memory},
  author = {Zuo, Pengfei and Sun, Jiazhao and Yang, Liu and Zhang, Shuangwu and Hua, Yu},
  booktitle = {Proceedings of the USENIX Annual Technical Conference (USENIX ATC)},
  year = {2021},
}

USENIX ATC

Lock-free Concurrent Level Hashing for Persistent Memory

Zhangyu Chen, Yu Hua, Bo Ding, and Pengfei Zuo

In Proceedings of the USENIX Annual Technical Conference (USENIX ATC), 2020

Bib PDF Code

@inproceedings{Chen2020LockfreeCL,
  title = {Lock-free Concurrent Level Hashing for Persistent Memory},
  author = {Chen, Zhangyu and Hua, Yu and Ding, Bo and Zuo, Pengfei},
  booktitle = {Proceedings of the USENIX Annual Technical Conference (USENIX ATC)},
  year = {2020},
}

ASPLOS

DeepSniffer: a DNN Model Extraction Framework Based on Learning Architectural Hints

Xing Hu, Ling Liang, Shuangchen Li, Lei Deng, Pengfei Zuo, Yu Ji, Xinfeng Xie, Yufei Ding, Chang Liu, and 2 more authors

In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020

Bib PDF

@inproceedings{Hu2020DeepSnifferAD,
  title = {DeepSniffer: a DNN Model Extraction Framework Based on Learning Architectural Hints},
  author = {Hu, Xing and Liang, Ling and Li, Shuangchen and Deng, Lei and Zuo, Pengfei and Ji, Yu and Xie, Xinfeng and Ding, Yufei and Liu, Chang and Sherwood, Timothy and Xie, Yuan},
  booktitle = {Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
  year = {2020},
}

MICRO

SuperMem: Enabling Application-transparent Secure Persistent Memory with Low Overheads

Pengfei Zuo, Yu Hua, and Yuan Xie

In Proceedings of the 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019

Bib PDF

@inproceedings{Zuo2019SuperMemEA,
  title = {SuperMem: Enabling Application-transparent Secure Persistent Memory with Low Overheads},
  author = {Zuo, Pengfei and Hua, Yu and Xie, Yuan},
  booktitle = {Proceedings of the 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO)},
  year = {2019},
}

MICRO

Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes

Pengfei Zuo, Yu Hua, Ming Zhao, Wen Zhou, and Yuncheng Guo

In Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018

Bib PDF

@inproceedings{Zuo2018ImprovingTP,
  title = {Improving the Performance and Endurance of Encrypted Non-volatile Main Memory through Deduplicating Writes},
  author = {Zuo, Pengfei and Hua, Yu and Zhao, Ming and Zhou, Wen and Guo, Yuncheng},
  booktitle = {Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)},
  year = {2018},
}

OSDI

Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory

Pengfei Zuo, Yu Hua, and Jie Wu

In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2018

Bib PDF Code

@inproceedings{Zuo2018WriteOptimizedAH,
  title = {Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory},
  author = {Zuo, Pengfei and Hua, Yu and Wu, Jie},
  booktitle = {Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI)},
  year = {2018},
}

Research

AI Systems and Algorithms

Memory Systems and Architectures

Storage Systems

News

Selected Publications