Improve model card: Add pipeline tag, library name, paper details, and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +146 -4
README.md CHANGED
@@ -1,9 +1,151 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - VanishD/DualDistill
5
  language:
6
  - en
7
- base_model:
8
- - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
4
  datasets:
5
  - VanishD/DualDistill
6
  language:
7
  - en
8
+ license: mit
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ ---
12
+
13
+ # Agentic-R1: Distilled Dual-Strategy Reasoning
14
+
15
+ This repository hosts the **Agentic-R1** model, an implementation of the paper [**Agentic-R1: Distilled Dual-Strategy Reasoning**](https://huggingface.co/papers/2507.05707).
16
+
17
+ **Code**: https://github.com/StigLidu/DualDistill
18
+
19
+ ## Abstract
20
+
21
+ Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train Agentic-R1, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems, and using text-based reasoning for abstract ones. Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks, demonstrating the effectiveness of multi-strategy distillation in achieving robust and efficient reasoning.
22
+
23
+ ## Key Features
24
+
25
+ - **Efficient Training**: Integrates tool use into long-chain-of-thought (CoT) reasoning using only 4 × A6000 GPUs
26
+ - **Unified Reasoning**: Fuses heterogeneous reasoning traces from multiple teacher models into a single student model
27
+
28
+ <div align="center">
29
+ <img src="https://github.com/StigLidu/DualDistill/raw/main/fig/overview.png" alt="Overview of DualDistill" width="500">
30
+ <p><em>Overview of DualDistill methodology</em></p>
31
+ </div>
32
+
33
+ ## Datasets
34
+
35
+ | Dataset | Description | Link |
36
+ | :------------ | :-------------------------------------------- | :--------------------------------------------------- |
37
+ | **Training Set** | Complete training dataset with teacher trajectories | [🤗 HuggingFace](https://huggingface.co/datasets/VanishD/DualDistill) |
38
+ | **Test Set** | Evaluation benchmarks | `dataset/test/` |
39
+
40
+ ## Results
41
+
42
+ <div align="center">
43
+ <img src="https://github.com/StigLidu/DualDistill/raw/main/fig/result.png" alt="Performance comparison of Agentic-R1 models" width="700">
44
+ </div>
45
+
46
+ - **Agentic-R1** demonstrates significant performance gains on **DeepMath-L** and **Combinatorics300**, where both complex reasoning and tool use are crucial for success.
47
+ - **Agentic-R1-SD** (Self-Distilled) further enhances performance through our self-distillation approach, consistently outperforming baseline models across nearly all evaluation tasks.
48
+
49
+ ## Quick Start
50
+
51
+ ### Installation
52
+
53
+ 1. **Clone the repository**:
54
+ ```bash
55
+ git clone https://github.com/StigLidu/DualDistill.git
56
+ cd DualDistill
57
+ ```
58
+
59
+ 2. **Create environment** (optional but recommended):
60
+ ```bash
61
+ conda create -n dualdistill python=3.11
62
+ conda activate dualdistill
63
+ ```
64
+
65
+ 3. **Install dependencies**:
66
+ ```bash
67
+ pip install -r requirements.txt
68
+ pip install flash-attn --no-build-isolation
69
+ ```
70
+
71
+ ### Sample Usage
72
+
73
+ Here's how to perform inference with the `Agentic-R1` model using the Hugging Face `transformers` library:
74
+
75
+ ```python
76
+ import torch
77
+ from transformers import AutoTokenizer, AutoModelForCausalLM
78
+
79
+ model_id = "VanishD/Agentic-R1" # Or "VanishD/Agentic-R1-SD" for the self-distilled version
80
+
81
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
82
+ model = AutoModelForCausalLM.from_pretrained(
83
+ model_id,
84
+ torch_dtype=torch.bfloat16, # Use bfloat16 for better performance and memory if supported
85
+ device_map="auto",
86
+ trust_remote_code=True
87
+ ).eval() # Set model to evaluation mode
88
+
89
+ # Prepare a simple user message
90
+ messages = [{"role": "user", "content": "What is 123 + 456?"}]
91
+
92
+ # Apply the chat template to format the prompt correctly for the model
93
+ # The `add_generation_prompt=True` adds the Assistant token to prompt the model for its response.
94
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
95
+
96
+ # Encode the prompt
97
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
98
+
99
+ # Generate response
100
+ output_ids = model.generate(
101
+ input_ids,
102
+ max_new_tokens=256,
103
+ do_sample=True,
104
+ temperature=0.7,
105
+ top_p=0.95,
106
+ eos_token_id=tokenizer.eos_token_id,
107
+ pad_token_id=tokenizer.pad_token_id, # Often EOS token is used as PAD token for LLMs
108
+ )
109
+
110
+ # Decode and print the generated text, excluding the input prompt
111
+ generated_text = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True).strip()
112
+ print(f"Generated Text:
113
+ {generated_text}")
114
+ ```
115
+
116
+ ## ⚠️ Important Notes
117
+
118
+ - **Code Execution Safety**: The evaluation scripts execute model-generated code locally. Only use trusted models before execution.
119
+ - **Inference Config**: If you are using vLLM (a recent version) and encounter an error regarding the maximum context length. You may need to modify the `model_max_length` in `tokenizer_config.json`.
120
+ - **Self-Distillation Warning**: The self-distillation step requires sampling many trajectories and can be time-consuming.
121
+
122
+ ## License
123
+
124
+ This project is licensed under the MIT License - see the [LICENSE](https://github.com/StigLidu/DualDistill/blob/main/LICENSE) file for details.
125
+
126
+ ## Acknowledgments
127
+
128
+ We thank the following open-source projects for their foundational contributions:
129
+
130
+ - [OpenHands](https://github.com/All-Hands-AI/OpenHands) - Agent framework
131
+ - [DeepMath-103K](https://huggingface.co/datasets/zwhe99/DeepMath-103K) - Mathematical reasoning dataset
132
+ - [vLLM](https://github.com/vllm-project/vllm) - High-performance inference engine
133
+
134
+ ## Contact
135
+
136
+ For questions or support, please contact:
137
+
138
+ - **Weihua Du**: [weihuad@cs.cmu.edu](mailto:weihuad@cs.cmu.edu)
139
+
140
+ ## Citation
141
+
142
+ If you find our work useful, please consider citing:
143
+
144
+ ```bibtex
145
+ @article{du2025agentic,
146
+ title={Agentic-R1: Distilled Dual-Strategy Reasoning},
147
+ author={Du, Weihua and Aggarwal, Pranjal and Welleck, Sean and Yang, Yiming},
148
+ journal={arXiv preprint arXiv:2507.05707},
149
+ year={2025}
150
+ }
151
+ ```