Welcome to Pengfei's Blog! 👏

2025 Year-End Summary: Four Trends and Five Representative Works of Innovation in LLM Inference Systems

Introduction: The First Year of Inference Explosion and the Hundred-Billion Cost Battle Looking back at 2025, we not only experienced technological iterations, but also witnessed a dramatic shift in the industrial landscape.

If the past few years were an arms race of “large-scale model training,” then 2025 was undoubtedly the first year of “inference business explosion.” As model capabilities matured and Agent applications landed, the balance of cloud computing power fundamentally shifted: currently, the vast majority of GPU/NPU resources on the cloud are occupied by inference workloads.

Does NVIDIA Dynamo's PD Disaggregation Have Issues? Our Proposed "Adrenaline" Is The Remedy!

Does NVIDIA Dynamo’s PD Disaggregation have issues? Our proposed “Adrenaline” is the remedy! .

DeepSeek Has NSA (Native Sparse Attention), While We Have PSA (Progressive Sparse Attention)

DeepSeek has NSA (Native Sparse Attention), while we have PSA (Progressive Sparse Attention). .