nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 Text Generation • 124B • Updated 17 days ago • 729k • • 359
view article Article Building Tensors from Scratch in Rust (Part 1.2): View Operations KeighBee • Jun 18, 2025 • 4
Running 596 Scaling test-time compute 📈 596 Run advanced search strategies to boost LLM problem solving
Search-R1 Collection Preliminary checkpoints with outcome-only RL. • 15 items • Updated Aug 12, 2025 • 18
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16, 2025 • 169