Welcome to Pengfei's Blog! 👏

2025 Year-End Summary: Four Trends and Five Representative Works of Innovation in LLM Inference Systems

Introduction: The First Year of Inference Explosion and the Hundred-Billion Cost Battle Looking back at 2025, we not only experienced technological iterations, but also witnessed a dramatic shift in the industrial landscape.

If the past few years were an arms race of “large-scale model training,” then 2025 was undoubtedly the first year of “inference business explosion.” As model capabilities matured and Agent applications landed, the balance of cloud computing power fundamentally shifted: currently, the vast majority of GPU/NPU resources on the cloud are occupied by inference workloads.

January 06, 2026 · 21 min read

Does NVIDIA Dynamo's PD Disaggregation Have Issues? Our Proposed "Adrenaline" Is The Remedy!

Does NVIDIA Dynamo’s PD Disaggregation have issues? Our proposed “Adrenaline” is the remedy! .

March 15, 2025 · 1 min read

DeepSeek Has NSA (Native Sparse Attention), While We Have PSA (Progressive Sparse Attention)

DeepSeek has NSA (Native Sparse Attention), while we have PSA (Progressive Sparse Attention). .

March 01, 2025 · 1 min read

View All Posts →