ldsjmdy
/

Qwen3-8B-FreeLM-LoRA

Safetensors

Model card Files Files and versions

xet

Community

ldsjmdy commited on 7 days ago

Commit

1b4e9ba

verified ·

1 Parent(s): f089aac

Create readme.md

Browse files

Files changed (1) hide show

readme.md +55 -0

readme.md ADDED Viewed

	@@ -0,0 +1,55 @@

+<div align="center">
+<h1>Qwen3-8B-FreeLM-LoRA</h1>
+</div>
+<div align="center">
+[![Paper](https://img.shields.io/badge/Paper-ArXiv-b31b1b.svg)](https://arxiv.org/abs/your_paper_link)
+[![Hugging Face Collections](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-blue)](https://huggingface.co/collections/your_collection)
+[![GitHub stars](https://img.shields.io/github/stars/TemporaryLoRA/FreeLM.svg?colorA=orange&colorB=orange&logo=github)](https://github.com/TemporaryLoRA/FreeLM)
+[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](./LICENSE)
+</div>
+Implementation of paper [Free(): Learning to Forget in Malloc-Only Reasoning Models]()
+Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as **"malloc-only" engines**, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose **Free()LM**, a model that introduces an intrinsic self-forgetting capability via the **Free-Module**, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state.
+Extensive experiments show that Free()LM provides consistent improvements across all model scales (8B to 685B). It achieves a 3.3\% average improvement over top-tier reasoning baselines, even establishing a new **SOTA** on IMOanswerBench using DeepSeek V3.2-Speciale.
+Most notably, in long-horizon tasks where the standard Qwen3-235B-A22B model suffers a total collapse (0\% accuracy), Free()LM restores performance to **~50**. Our findings suggest that sustainable intelligence requires the freedom to forget as much as the power to think.
+![Figure1_new](https://cdn-uploads.huggingface.co/production/uploads/6734a0fe3ed65dd196e40cfa/hhMi4OjxiSTTvZr7yVfnh.png)
+## Resources
+| Base Model | Method | Checkpoint |
+| :--- | :---: | :--- |
+| **Qwen3-8B** | Free()LM | [🤗 ldsjmdy/Qwen3-8B-FreeLM-LoRA](https://huggingface.co/ldsjmdy/Qwen3-8B-FreeLM-LoRA) |
+| **Qwen3-30B-A3B-Thinking-2507** | Free()LM | [🤗 ldsjmdy/Qwen3-30B-A3B-Thinking-2507-FreeLM-LoRA](https://huggingface.co/ldsjmdy/Qwen3-30B-A3B-Thinking-2507-FreeLM-LoRA) |
+| **Qwen3-235B-A3B-Thinking-2507** | Free()LM | [🤗 ldsjmdy/Qwen3-235B-A3B-Thinking-2507-FreeLM-LoRA](https://huggingface.co/ldsjmdy/Qwen3-235B-A3B-Thinking-2507-FreeLM-LoRA) |
+- Train/Eval Data: [🤗 ldsjmdy/FreeLM](https://huggingface.co/datasets/ldsjmdy/FreeLM)
+## Performance
+![image](https://cdn-uploads.huggingface.co/production/uploads/6734a0fe3ed65dd196e40cfa/v8ynVFM5Wg00fZ7MVeu0I.png)
+> Performance of Qwen3 models. We report pass@1 (p@1) performance computed over 8 rollouts, along with the average number of response tokens (\#Token). For the Average columns, brackets represent the absolute change for p@1 and the relative change for Token (where blue indicates improvement and red indicates regression).
+## Usage
+For detailed usage instructions and source code, please refer to our [FreeLM](https://github.com/TemporaryLoRA/FreeLM/tree/main).
+## Citation
+If you find `FreeLM` useful for your research, please cite our paper:
+```bibtex
+```