Junyuan Shang's picture

2 1

Junyuan Shang

sjy1203

·

sjy1203

AI & ML interests

NLP

Recent Activity

authored a paper 12 days ago

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

authored a paper 12 days ago

NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

authored a paper 12 days ago

ERNIE 5.0 Technical Report

View all activity

Organizations

None yet

authored 3 papers 12 days ago

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

Paper • 2406.06567 • Published Jun 3, 2024

NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

Paper • 2408.03675 • Published Aug 7, 2024

ERNIE 5.0 Technical Report

Paper • 2602.04705 • Published 13 days ago • 251

upvoted a paper 12 days ago

ERNIE 5.0 Technical Report

Paper • 2602.04705 • Published 13 days ago • 251

authored 6 papers about 1 month ago

ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

Paper • 2012.15688 • Published Dec 31, 2020 • 1

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Paper • 2107.02137 • Published Jul 5, 2021

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Paper • 2112.12731 • Published Dec 23, 2021 • 1

Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking

Paper • 2502.13842 • Published Feb 19, 2025

Mixture of Hidden-Dimensions Transformer

Paper • 2412.05644 • Published Dec 7, 2024

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Paper • 2601.05966 • Published Jan 9 • 23

upvoted a paper about 1 month ago

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Paper • 2601.05966 • Published Jan 9 • 23

liked a model 5 months ago

baidu/ERNIE-4.5-21B-A3B-Thinking

Text Generation • 22B • Updated Nov 26, 2025 • 662 • • 772