Announcement_11
Our paper “Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention” was accepted by USENIX ATC’2024. Congratulations to Bin Gao!
Our paper “Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention” was accepted by USENIX ATC’2024. Congratulations to Bin Gao!