DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion Paper • 2406.06567 • Published Jun 3, 2024
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time Paper • 2408.03675 • Published Aug 7, 2024
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer Paper • 2012.15688 • Published Dec 31, 2020 • 1
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation Paper • 2107.02137 • Published Jul 5, 2021
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation Paper • 2112.12731 • Published Dec 23, 2021 • 1
Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking Paper • 2502.13842 • Published Feb 19, 2025
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction Paper • 2601.05966 • Published Jan 9 • 23
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction Paper • 2601.05966 • Published Jan 9 • 23