For full information, go check out the Dr Tulu paper here.
DR Tulu 8B - MLX 6-bit Quantized
This is DR Tulu 8B converted to MLX 6-bit quantized format for inference on Apple Silicon hardware. This variant provides balanced performance-quality trade-off while maintaining DR-Tulu's signature reasoning capabilities.
MLX Model Variants - Complete Collection
Choose the best variant for your hardware and performance needs:
| Model | Precision | Model Size | Bits/Weight | Memory Usage | Performance | Repository |
|---|---|---|---|---|---|---|
| DR-Tulu-8B-MLX-4bit | 4-bit quantized | 4.3GB | 4.500 | 4.9GB | 78.2 tok/s | Plurigrid/DR-Tulu-8B-MLX-4bit |
| DR-Tulu-8B-MLX-6bit | 6-bit quantized | 6.2GB | 6.500 | 6.9GB | 60.7 tok/s | Plurigrid/DR-Tulu-8B-MLX-6bit |
| DR-Tulu-8B-MLX-8bit | 8-bit quantized | 8.1GB | 8.500 | 8.8GB | 59.8 tok/s | Plurigrid/DR-Tulu-8B-MLX-8bit |
| DR-Tulu-8B-MLX-bf16 | bfloat16 (full) | 15.3GB | ~16.000 | 16.4GB | 35.0 tok/s | Plurigrid/DR-Tulu-8B-MLX-bf16 |
Why Choose 6-bit?
- Balanced performance-quality trade-off: 60.7 tokens/sec
- Moderate Memory: 6.9GB RAM usage (2.4x less than bf16)
- Quality Focused: Enhanced precision over 4-bit with 6.500 bits/weight
- Enhanced Reasoning: Superior quality preservation vs 4-bit quantization
- Versatile: Suitable for production applications on Apple Silicon devices (16GB+)
Quick Start
Command Line Interface
# Interactive chat (recommended)
uvx --from mlx-lm mlx_lm.chat --model Plurigrid/DR-Tulu-8B-MLX-6bit
# Generate text
uvx --from mlx-lm mlx_lm.generate --model Plurigrid/DR-Tulu-8B-MLX-6bit --prompt "What is category theory?" --max-tokens 500
Python API
from mlx_lm import load, generate
# Load the 6-bit quantized model
model, tokenizer = load("Plurigrid/DR-Tulu-8B-MLX-6bit")
prompt = "Explain quantum computing step by step."
# Apply chat template if available
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
# Generate response with DR-Tulu reasoning
response = generate(model, tokenizer, prompt=prompt, verbose=True)
print(response)
Installation
# Install MLX-LM
pip install mlx-lm
# or with uv
uv add mlx-lm
Hardware Requirements
| Component | Minimum | Recommended |
|---|---|---|
| Platform | Apple Silicon (M1/M2/M3/M4/M5) | M1 Pro/Max or newer |
| Memory | 16GB unified memory | 24GB+ unified memory |
| Storage | 7GB free space | 15GB+ free space |
| OS | macOS 12+ | macOS 14+ (Sonoma) |
Tested Configuration: Mac Studio M1 Ultra (20-core CPU, 128GB unified memory), macOS Sequoia 15.2
Technical Specifications
6-bit Quantization Details:
- Quantization Method: MLX native affine quantization
- Effective Bits: 6.500 bits per weight
- Group Size: 128 (default)
- Conversion Command:
mlx_lm.convert --quantize --q-bits 6 - Quality Preservation: Excellent+ (enhanced over 4-bit)
Performance Metrics:
- Inference Speed: 60.7 tokens/second
- Memory Efficiency: 6.9GB peak usage
- Model Loading: ~3-5 seconds
- Quality: Superior precision with maintained
<think>reasoning
About DR Tulu
This is the RL checkpoint of DR Tulu, an open deep research agent trained on top of rl-research/DR-Tulu-SFT-8B.
Key Capabilities:
- Step-by-step reasoning with visible
<think>tags - Research-grade analysis and problem-solving
- Tool-use and multi-turn conversations
- Mathematical and scientific reasoning
For more details on DR Tulu, please read our paper!
Evaluation Results
Results from the original DR-Tulu-8B model (quality preserved in 6-bit variant):
| Benchmark | SQAv2 | HealthBench | ResearchQA | DeepResearch Bench | SimpleQA | 2Wiki | WebWalker | Average |
|---|---|---|---|---|---|---|---|---|
| DR-Tulu-8B | 86.7 | 43.7 | 71.1 | 41.8 | 80.1 | 68.0 | 39.1 | 61.5 |
Advanced Usage
Multi-turn Conversation
messages = [
{"role": "user", "content": "What is category theory?"},
{"role": "assistant", "content": "Category theory is a mathematical framework..."},
{"role": "user", "content": "How does it apply to computer science?"}
]
if tokenizer.chat_template is not None:
formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=1000)
Research-Style Analysis
research_prompt = """
Analyze the relationship between quantum mechanics and information theory.
Think step by step and provide a comprehensive analysis.
"""
response = generate(model, tokenizer, prompt=research_prompt, max_tokens=1500, verbose=True)
Long-Form Generation (1069+ tokens)
# 6-bit excels at longer, high-quality responses
complex_prompt = "Explain the foundations of category theory and its applications across mathematics."
response = generate(model, tokenizer, prompt=complex_prompt, max_tokens=1069, verbose=True)
6-bit Advantages
Quality vs Performance Analysis:
- Higher precision (6.500 vs 4.500 bits/weight)
- 1.7x faster than bf16 (60.7 vs 35.0 tokens/sec)
- 2.4x less memory than bf16 (6.9GB vs 16.4GB)
- Suitable for production applications requiring quality + speed
Use Cases:
- Research applications requiring nuanced reasoning
- Production deployments with quality constraints
- Interactive applications on mid-range Apple Silicon
- Balanced workflows needing speed + precision
Related Links
- DR Tulu Paper
- DR Tulu Demo
- DR Tulu Code
- DR Tulu Collection
- Original Model
- MLX Framework
- Complete MLX Options
License & Usage
This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.
6-bit Specific Considerations:
- Optimized for Apple Silicon hardware only
- Enhanced quality preservation with 6.500 bits per weight
- Balanced performance in the MLX model series
- Ideal for applications requiring quality + efficiency
π Citation
@article{drtulu,
title = {{DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research}},
author = {{Rulin Shao, Akari Asai, Shannon Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, Sam Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldani, Faeze Brahman, Scott Yih, Sherry Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hanna Hajishirzi, Pang Wei Koh}},
year = {2025},
}
π Conversion Details
- Conversion Date: November 22, 2024
- Converter: MLX community via Plurigrid
- Command:
uvx --from mlx-lm mlx_lm.convert --hf-path rl-research/DR-Tulu-8B --mlx-path ./DR-Tulu-8B-6bit --quantize --q-bits 6 - Framework Version: mlx-lm latest (November 2024)
- Validation: Tested with 1069-token generation maintaining superior quality vs 4-bit
- Downloads last month
- 38
Model tree for Plurigrid/DR-Tulu-8B-MLX-6bit
Base model
Qwen/Qwen3-8B-Base