Fast, lossless LLM inference via dual-view diffusion decoding.
-
chiennv/Orthrus-Qwen3-4B
Text Generation • 5B • Updated • 168 • 5 -
chiennv/Orthrus-Qwen3-8B
Text Generation • 10B • Updated • 1.37k • 11 -
chiennv/Orthrus-Qwen3-1.7B
Text Generation • 2B • Updated • 338 • 6 -
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion
Paper • 2605.12825 • Published • 10