Model Card: SS-350M-SQL-Strict

Model Summary

SS-350M-SQL-Strict is a specialized, lightweight LLM fine-tuned for the singular task of Text-to-SQL translation. Built upon the LiquidAI LFM2.5-350M architecture, this model has been engineered to follow a "Strict" output protocol: it generates only raw SQL code, eliminating the conversational filler, Markdown blocks, and explanations typically found in general-purpose models.

By leveraging 4-bit QLoRA and Unsloth optimizations, this model provides high-speed, low-latency SQL generation suitable for edge deployment and resource-constrained environments.

Model Details

Developed by: Saad Salman
Architecture: Liquid Foundation Model (LFM) 2.5
Parameters: 350 Million
Quantization: 4-bit (bitsandbytes)
Fine-tuning Method: QLoRA
Primary Task: Natural Language to SQL (Strict)

Training Logic & Parameters

The model was trained using a custom pipeline to enforce strict code generation. The key differentiator is the use of Completion-Only Loss masking, which prevents the model from wasting weights on learning the prompt structure, focusing 100% of its learning capacity on the SQL syntax.

Hyperparameters

Parameter	Value	Description
Max Steps	800	Optimal convergence point for 350M params
Learning Rate	2e-4	High enough for rapid logic acquisition
Batch Size	16	(4 per device with 4 grad accumulation)
Rank (r)	32	High rank to capture complex SQL logic
Alpha	32	Scaling factor for LoRA weights
Optimizer	AdamW 8-bit	Memory-efficient optimization

Training Curve Analysis

The model demonstrated a classic "L-shaped" convergence curve. Initial loss started at ~38.1 and successfully plateaued between 8.0 and 11.0. This plateau indicates the model has fully internalized the ChatML structure and the SQL schema-mapping logic.

Prompting Specification (ChatML)

To ensure the "Strict" behavior, you must use the following ChatML format. Failure to use this format may result in hallucinated text.

Template

<|im_start|>system
You are a SQL translation engine. Return ONLY raw SQL. Schema: {YOUR_SCHEMA}<|im_end|>
<|im_start|>user
{YOUR_QUESTION}<|im_end|>
<|im_start|>assistant

Example Input

<|im_start|>system
You are a SQL translation engine. Return ONLY raw SQL. Schema: Table 'orders' (id, price, status, created_at)<|im_end|>
<|im_start|>user
Find the average price of all 'completed' orders.<|im_end|>
<|im_start|>assistant

Example Output

SELECT AVG(price) FROM orders WHERE status = 'completed';

Training Dataset

The model was trained on the Gretel Synthetic SQL dataset. This dataset is designed to cover:

Complex joins and subqueries.
Diverse industry domains (Finance, Retail, Tech).
Correct handling of GROUP BY, ORDER BY, and HAVING clauses.

Technical Limitations

Schema Size: Best suited for schemas with < 20 tables.
Dialect: Defaulted to standard SQL.
Reasoning: The model does not "explain" its code; it is a direct translation engine.

How to Use with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "saadxsalman/SS-350M-SQL-Strict"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")

# Ready for inference!

Downloads last month: 28

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for saadxsalman/SS-350M-SQL-Strict

Base model

LiquidAI/LFM2.5-350M-Base

Finetuned

LiquidAI/LFM2.5-350M