Intelligence per Watt: Measuring Intelligence Efficiency of Local AI Paper • 2511.07885 • Published Nov 11, 2025 • 10
Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models Paper • 2510.10964 • Published Oct 13, 2025 • 3
Devstral 2 Collection A couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents. • 3 items • Updated Dec 9, 2025 • 44
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano v3. • 8 items • Updated about 14 hours ago • 65
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples Paper • 2510.07192 • Published Oct 8, 2025 • 5
Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation Paper • 2408.13586 • Published Aug 24, 2024 • 3
Lingshu MLLMs Collection Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning • 5 items • Updated about 7 hours ago • 21
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 7 items • Updated 28 days ago • 78
Gemma 3 Collection Collection Some fun things I've made on Gemma 3 • 6 items • Updated Apr 18, 2025 • 2
RpR Models Collection RpR (RolePlay with Reasoning) models which are built on RPMax datasets with properly trained multi-turn reasoning. • 8 items • Updated Jun 25, 2025 • 18
GPT-OSS General (4.2B to 20B) Collection Collection of pruned GPT-OSS models spanning 1-32 experts, maintaining general capabilities across domains while reducing computational requirements. • 29 items • Updated Aug 13, 2025 • 10