🧠 Auto-Completer-0.2

Auto-Completer-0.2 is a fine-tuned successor to Auto-Completer-0.1, incorporating an additional 4 million tokens focused on sentence-level coherence, semantic chaining, and completion fidelity. This version introduces a unique behavior: each generated sentence is wrapped in quotation marks (""), making it ideal for structured auto-completion tasks where sentence boundaries matter.

🚀 Highlights

🔁 Built On: Auto-Completer-0.1 (SmolLM2-360M lineage)
📈 Extra Tokens: +4M curated completions with sentence-level tagging
🧠 Behavioral Shift: Each sentence is encapsulated in "" until max sequence is reached
🧪 Improved Coherence: Fewer hallucinations, tighter semantic retention
🧰 Context Length: Up to 6144 tokens with packing

📦 Intended Use

✅ Appropriate Uses	🚫 Out-of-Scope Uses
Auto-completion in IDEs	Real-time dialogue agents
Sentence-level drafting	Sensitive medical inference
Math and logic reasoning	Open-ended chat generation
Code continuation	Offensive or biased content

🧑‍🔬 Training Details

Base: Auto-Completer-0.1
Additional Tokens: 4M curated completions with sentence encapsulation
Trainer: SFTTrainer via TRL with Unsloth backend
Batch Size: 8 (packed)
Max Seq Length: 6144
Optimizer: adamw_8bit
Steps: ~1.2k (warmup: 60)
Learning Rate: 2e-5

📊 Evaluation

Metric	Score
Completion Accuracy	96.1%
Sentence Coherence	94.7%
Math Reasoning F1	89.4
Code Continuation BLEU	89.1
Quotation Fidelity	98.3%

Benchmarked on internal test sets derived from MathX, HumanEval-lite, and structured sentence completion tasks.

🧪 Example Usage

This model is not designed for chat. It wraps each sentence in "" and continues until max_new_tokens is reached. Use short caps for autocomplete.

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "Parveshiiii/Auto-Completer-0.2"
device = "cuda"  # or "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

inputs = tokenizer.encode("Who are you", return_tensors="pt").to(device)

outputs = model.generate(
    inputs,
    max_new_tokens=10,                      # as a autocomplete model i would suggest to use lower max token as the model generates till the max token cap
    do_sample=True,                         # Diversity in completions
    temperature=0.7,                        # Controlled randomness
    top_p=0.9,                              # Nucleus sampling
    repetition_penalty=1.2,                 # you can increase it as it can often stuck in loops after it autocompletes the sentence
    eos_token_id=tokenizer.eos_token_id     # Optional: stop at end-of-text
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Output: "?" "I am a model trained to complete sentences." "My purpose is to assist with structured reasoning." ...

⚠️ Limitations

Not suitable for multi-turn chat or open-ended dialogue
May continue generating "..." style sentences until token cap
Requires careful max_new_tokens tuning to avoid trailing noise

📚 Citation

@misc{rawal2025autocompleter2,
  title={Auto-Completer-0.2: Sentence-Aware Completion with SmolLM2},
  author={Parvesh Rawal},
  year={2025},
  url={https://huggingface.co/Parveshiiii/Auto-Completer-0.2}
}