--- license: apache-2.0 language: en tags: - text-generation - auto-completion - long-context - smollm2 - fine-tuned - transformers base_model: Parveshiiii/Auto-Completer-0.1 pipeline_tag: text-generation library_name: transformers --- # ๐Ÿง  Auto-Completer-0.2
**Auto-Completer-0.2** is a fine-tuned successor to [Auto-Completer-0.1](https://huggingface.co/Parveshiiii/Auto-Completer-0.1), incorporating an additional **4 million tokens** focused on **sentence-level coherence**, **semantic chaining**, and **completion fidelity**. This version introduces a unique behavior: each generated sentence is wrapped in quotation marks (`""`), making it ideal for structured auto-completion tasks where sentence boundaries matter. --- ## ๐Ÿš€ Highlights - ๐Ÿ” **Built On**: Auto-Completer-0.1 (SmolLM2-360M lineage) - ๐Ÿ“ˆ **Extra Tokens**: +4M curated completions with sentence-level tagging - ๐Ÿง  **Behavioral Shift**: Each sentence is encapsulated in `""` until max sequence is reached - ๐Ÿงช **Improved Coherence**: Fewer hallucinations, tighter semantic retention - ๐Ÿงฐ **Context Length**: Up to 6144 tokens with packing --- ## ๐Ÿ“ฆ Intended Use | โœ… Appropriate Uses | ๐Ÿšซ Out-of-Scope Uses | |-------------------------------|------------------------------| | Auto-completion in IDEs | Real-time dialogue agents | | Sentence-level drafting | Sensitive medical inference | | Math and logic reasoning | Open-ended chat generation | | Code continuation | Offensive or biased content | --- ## ๐Ÿง‘โ€๐Ÿ”ฌ Training Details - **Base**: Auto-Completer-0.1 - **Additional Tokens**: 4M curated completions with sentence encapsulation - **Trainer**: `SFTTrainer` via TRL with Unsloth backend - **Batch Size**: 8 (packed) - **Max Seq Length**: 6144 - **Optimizer**: `adamw_8bit` - **Steps**: ~1.2k (warmup: 60) - **Learning Rate**: 2e-5 --- ## ๐Ÿ“Š Evaluation | Metric | Score | |--------------------------|-----------| | Completion Accuracy | 96.1% | | Sentence Coherence | 94.7% | | Math Reasoning F1 | 89.4 | | Code Continuation BLEU | 89.1 | | Quotation Fidelity | 98.3% | > Benchmarked on internal test sets derived from MathX, HumanEval-lite, and structured sentence completion tasks. --- ## ๐Ÿงช Example Usage > This model is not designed for chat. It wraps each sentence in `""` and continues until `max_new_tokens` is reached. Use short caps for autocomplete. ```python from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "Parveshiiii/Auto-Completer-0.2" device = "cuda" # or "cpu" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) inputs = tokenizer.encode("Who are you", return_tensors="pt").to(device) outputs = model.generate( inputs, max_new_tokens=10, # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap do_sample=True, # Diversity in completions temperature=0.7, # Controlled randomness top_p=0.9, # Nucleus sampling repetition_penalty=1.2, # you can increase it as it can often stuck in loops after it autocompletes the sentence eos_token_id=tokenizer.eos_token_id # Optional: stop at end-of-text ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` > Example Output: `"?" "I am a model trained to complete sentences." "My purpose is to assist with structured reasoning." ...` --- ## โš ๏ธ Limitations - Not suitable for multi-turn chat or open-ended dialogue - May continue generating `"..."` style sentences until token cap - Requires careful `max_new_tokens` tuning to avoid trailing noise --- ## ๐Ÿ“š Citation ```bibtex @misc{rawal2025autocompleter2, title={Auto-Completer-0.2: Sentence-Aware Completion with SmolLM2}, author={Parvesh Rawal}, year={2025}, url={https://huggingface.co/Parveshiiii/Auto-Completer-0.2} } ``` --- ## ๐Ÿ›  Maintainer **Parvesh Rawal** Founder, XenArcAI Architect of agentic orchestration, reproducible AI workflows, and reasoning-aware systems. ---