Pengfei Zuo

I currently serve as the chief architect of AI-native storage at Huawei Cloud. I lead the EMS (Elastic Memory Service) team to build a disaggregated memory service layer in the cloud, upgrading Huawei Cloud’s two-tier infrastructure architecture that disaggregates storage and computing into a three-tier architecture that disaggregates computing, memory, and storage. This transformation enables higher resource elasticity and utilization. EMS also serves as a foundational memory infrastructure to enhance the efficiency of LLM training and inference on Huawei Cloud.
I received my Ph.D. degree (advised by Prof. Yu Hua) in Computer Science from Huazhong University of Science and Technology (HUST) in 2019. I was a visiting Ph.D. student (advised by Prof. Yuan Xie) at the University of California, Santa Barbara (UCSB) during 2018-2019. I received a B.E. degree in Computer Science from HUST in 2014.
My research focuses on AI and cloud infrastructure, with an emphasis on machine learning systems and memory systems. I have published 50+ refereed papers in top-tier computer systems and architecture conferences and journals, including SOSP, OSDI, MICRO, ASPLOS, FAST, USENIX ATC, VLDB, and DAC. I received the 2020 ACM China Doctoral Dissertation Award (only two awardees among all computer disciplines across China every year) and the Best Paper Award in FAST 2023. The open-source codes for our research on AI systems and disaggregated data centers are available at ASISys and DMemSys, respectively.
I am seeking motivated interns and postdocs in AI systems. If you’re passionate about tackling key industry challenges and publishing impactful research, feel free to reach out!
Research
AI Systems and Algorithms
- LLM Serving Systems: CachedAttention (USENIX ATC’24), Adrenaline (ArXiv’25), TaiChi (ArXiv’25)
- AI Algorithms: AdaSkip (AAAI’25), Progressive Sparse Attention (ArXiv’25)
- AI Hardware Architectures: DeepSniffer (ASPLOS’20), SEAL (DAC’21), Memory Trojaning (TCAD’21), CloudMatrix384 (Technical Report’25)
Memory Systems and Architectures
- Disaggregated Memory Systems: FORD (FAST’22), ROLEX (FAST’23), Ditto (SOSP’23), Aceso (SOSP’24)
- Disaggregated Memory Indexes: RACE (USENIX ATC’21), SMART (OSDI’23), CHIME (SOSP’24)
- Persistent Memory Systems: Path Hashing (MSST’17), Level Hashing (OSDI’18) , CLevel (USENIX ATC’20)
- Persistent Memory Architectures: DPEC (DATE’18), DeWrite (MICRO’18), SecPM (HotStorage’18), SuperMem (MICRO’19), SimCom (DAC’20)
Storage Systems
- Indexes: MinCounter (MSST’15), SmartCuckoo (USENIX ATC’17), DLSH (SoCC’17), FINEDex (VLDB’22)
- Deduplication Systems: SMR (MSST’17), BEES (ICDCS’17), RRCS (IPDPS’18)
News
Jun 16, 2025 | Our technical report “Serving Large Language Models on Huawei CloudMatrix384” is now available on ArXiv! |
---|---|
Dec 09, 2024 | Our paper “AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference” was accepted by AAAI 2025. Congratulations to Zhuomin and Yizhen! |
May 05, 2024 | Our two papers “Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value Stores” and “CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated Memory” were accepted by SOSP’2024. Congratulations to Zhisheng and Xuchuan! |
May 05, 2024 | Our paper “Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention” was accepted by USENIX ATC’2024. Congratulations to Bin Gao! |
Jul 15, 2023 | Our paper “Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System” was accepted by SOSP’2023. Congratulations to Jiacheng! |
Selected Publications
- FASTROLEX: A Scalable RDMA‑oriented Learned Key‑Value Store for Disaggregated Memory SystemsIn Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST), 2023