For full information, go check out the Dr Tulu paper here.

DR Tulu 8B - MLX 6-bit Quantized

This is DR Tulu 8B converted to MLX 6-bit quantized format for inference on Apple Silicon hardware. This variant provides balanced performance-quality trade-off while maintaining DR-Tulu's signature reasoning capabilities.

MLX Model Variants - Complete Collection

Choose the best variant for your hardware and performance needs:

Model	Precision	Model Size	Bits/Weight	Memory Usage	Performance	Repository
DR-Tulu-8B-MLX-4bit	4-bit quantized	4.3GB	4.500	4.9GB	78.2 tok/s	`Plurigrid/DR-Tulu-8B-MLX-4bit`
DR-Tulu-8B-MLX-6bit	6-bit quantized	6.2GB	6.500	6.9GB	60.7 tok/s	`Plurigrid/DR-Tulu-8B-MLX-6bit`
DR-Tulu-8B-MLX-8bit	8-bit quantized	8.1GB	8.500	8.8GB	59.8 tok/s	`Plurigrid/DR-Tulu-8B-MLX-8bit`
DR-Tulu-8B-MLX-bf16	bfloat16 (full)	15.3GB	~16.000	16.4GB	35.0 tok/s	`Plurigrid/DR-Tulu-8B-MLX-bf16`

Why Choose 6-bit?

Balanced performance-quality trade-off: 60.7 tokens/sec
Moderate Memory: 6.9GB RAM usage (2.4x less than bf16)
Quality Focused: Enhanced precision over 4-bit with 6.500 bits/weight
Enhanced Reasoning: Superior quality preservation vs 4-bit quantization
Versatile: Suitable for production applications on Apple Silicon devices (16GB+)

Quick Start

Command Line Interface

# Interactive chat (recommended)
uvx --from mlx-lm mlx_lm.chat --model Plurigrid/DR-Tulu-8B-MLX-6bit

# Generate text
uvx --from mlx-lm mlx_lm.generate --model Plurigrid/DR-Tulu-8B-MLX-6bit --prompt "What is category theory?" --max-tokens 500

Python API

from mlx_lm import load, generate

# Load the 6-bit quantized model
model, tokenizer = load("Plurigrid/DR-Tulu-8B-MLX-6bit")

prompt = "Explain quantum computing step by step."

# Apply chat template if available
if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

# Generate response with DR-Tulu reasoning
response = generate(model, tokenizer, prompt=prompt, verbose=True)
print(response)

Installation

# Install MLX-LM
pip install mlx-lm
# or with uv
uv add mlx-lm

Hardware Requirements

Component	Minimum	Recommended
Platform	Apple Silicon (M1/M2/M3/M4/M5)	M1 Pro/Max or newer
Memory	16GB unified memory	24GB+ unified memory
Storage	7GB free space	15GB+ free space
OS	macOS 12+	macOS 14+ (Sonoma)

Tested Configuration: Mac Studio M1 Ultra (20-core CPU, 128GB unified memory), macOS Sequoia 15.2

Technical Specifications

6-bit Quantization Details:

Quantization Method: MLX native affine quantization
Effective Bits: 6.500 bits per weight
Group Size: 128 (default)
Conversion Command: mlx_lm.convert --quantize --q-bits 6
Quality Preservation: Excellent+ (enhanced over 4-bit)

Performance Metrics:

Inference Speed: 60.7 tokens/second
Memory Efficiency: 6.9GB peak usage
Model Loading: ~3-5 seconds
Quality: Superior precision with maintained <think> reasoning

About DR Tulu

This is the RL checkpoint of DR Tulu, an open deep research agent trained on top of rl-research/DR-Tulu-SFT-8B.

Key Capabilities:

Step-by-step reasoning with visible <think> tags
Research-grade analysis and problem-solving
Tool-use and multi-turn conversations
Mathematical and scientific reasoning

For more details on DR Tulu, please read our paper!

Evaluation Results

Results from the original DR-Tulu-8B model (quality preserved in 6-bit variant):

Benchmark	SQAv2	HealthBench	ResearchQA	DeepResearch Bench	SimpleQA	2Wiki	WebWalker	Average
DR-Tulu-8B	86.7	43.7	71.1	41.8	80.1	68.0	39.1	61.5

Advanced Usage

Multi-turn Conversation

messages = [
    {"role": "user", "content": "What is category theory?"},
    {"role": "assistant", "content": "Category theory is a mathematical framework..."},
    {"role": "user", "content": "How does it apply to computer science?"}
]

if tokenizer.chat_template is not None:
    formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
    response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=1000)

Research-Style Analysis

research_prompt = """
Analyze the relationship between quantum mechanics and information theory.
Think step by step and provide a comprehensive analysis.
"""

response = generate(model, tokenizer, prompt=research_prompt, max_tokens=1500, verbose=True)

Long-Form Generation (1069+ tokens)

# 6-bit excels at longer, high-quality responses
complex_prompt = "Explain the foundations of category theory and its applications across mathematics."
response = generate(model, tokenizer, prompt=complex_prompt, max_tokens=1069, verbose=True)

6-bit Advantages

Quality vs Performance Analysis:

Higher precision (6.500 vs 4.500 bits/weight)
1.7x faster than bf16 (60.7 vs 35.0 tokens/sec)
2.4x less memory than bf16 (6.9GB vs 16.4GB)
Suitable for production applications requiring quality + speed

Use Cases:

Research applications requiring nuanced reasoning
Production deployments with quality constraints
Interactive applications on mid-range Apple Silicon
Balanced workflows needing speed + precision

License & Usage

This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.

6-bit Specific Considerations:

Optimized for Apple Silicon hardware only
Enhanced quality preservation with 6.500 bits per weight
Balanced performance in the MLX model series
Ideal for applications requiring quality + efficiency

📚 Citation

@article{drtulu,
  title = {{DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research}},
  author = {{Rulin Shao, Akari Asai, Shannon Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, Sam Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldani, Faeze Brahman, Scott Yih, Sherry Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hanna Hajishirzi, Pang Wei Koh}},
  year = {2025},
}

🔄 Conversion Details

Conversion Date: November 22, 2024
Converter: MLX community via Plurigrid
Command: uvx --from mlx-lm mlx_lm.convert --hf-path rl-research/DR-Tulu-8B --mlx-path ./DR-Tulu-8B-6bit --quantize --q-bits 6
Framework Version: mlx-lm latest (November 2024)
Validation: Tested with 1069-token generation maintaining superior quality vs 4-bit

Downloads last month: 38

Safetensors

Model size

8B params

Tensor type

BF16

U32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Plurigrid/DR-Tulu-8B-MLX-6bit

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

rl-research/DR-Tulu-SFT-8B

Finetuned

rl-research/DR-Tulu-8B

Quantized

(5)

this model

Plurigrid
/

DR-Tulu-8B-MLX-6bit

DR Tulu 8B - MLX 6-bit Quantized

MLX Model Variants - Complete Collection

Why Choose 6-bit?

Quick Start

Command Line Interface

Python API

Installation

Hardware Requirements

Technical Specifications

About DR Tulu

Evaluation Results

Advanced Usage

Multi-turn Conversation

Research-Style Analysis

Long-Form Generation (1069+ tokens)

6-bit Advantages

Related Links

License & Usage

📚 Citation

🔄 Conversion Details

Model tree for Plurigrid/DR-Tulu-8B-MLX-6bit