UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs Paper • 2512.03383 • Published 7 days ago • 3
Performance Prediction for Large Systems via Text-to-Text Regression Paper • 2506.21718 • Published Jun 26 • 6
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval Paper • 2502.20969 • Published Feb 28 • 11
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper • 2412.10319 • Published Dec 13, 2024 • 11