Qwen2.5-3B Text-to-SQL (PostgreSQL) β€” Fine-Tuned

Overview

This repository contains a fine-tuned Qwen/Qwen2.5-3B-Instruct model specialized for Text-to-SQL generation in PostgreSQL for a realistic e-commerce + subscriptions analytics schema.

Artifacts are organized under a single Hub repo using subfolders:

  • fp16/ β€” merged FP16 model (recommended)
  • int8/ β€” quantized INT8 checkpoint (smaller footprint)
  • lora_adapter/ β€” LoRA adapter only (for further tuning / research)

Intended use

Use cases

  • Convert natural language questions into PostgreSQL queries.
  • Analytical queries over common e-commerce tables (customers, orders, products, subscriptions) plus ML prediction tables (churn/forecast).

Not for

  • Direct execution on sensitive or production databases without validation (schema checks, allow-lists, sandbox execution).
  • Security-critical contexts (SQL injection prevention and access control must be handled outside the model).

Training summary

Item Value
Base model Qwen/Qwen2.5-3B-Instruct
Fine-tuning method QLoRA (4-bit)
Optimizer paged_adamw_8bit
Epochs 4
Training time ~4 minutes (A100)
Trainable params 29.9M (1.73% of 3B total)
Decoding Greedy
Tracking MLflow (DagsHub)

Evaluation summary (100 test examples)

Primary metric: parseable PostgreSQL SQL (validated with sqlglot).
Secondary metric: exact match (strict string match vs. reference SQL).

Model Parseable SQL Exact match Mean latency (s) P50 (s) P95 (s)
qwen_finetuned_fp16_strict 1.00 0.15 0.433 0.427 0.736
qwen_finetuned_int8_strict 0.99 0.20 2.152 2.541 3.610
qwen_baseline_fp16 1.00 0.09 0.405 0.422 0.624
qwen_finetuned_fp16 0.93 0.13 0.527 0.711 0.739
qwen_finetuned_int8 0.93 0.13 2.672 3.454 3.623
gpt-4o-mini 1.00 0.04 1.616 1.551 2.820
claude-3.5-haiku 0.99 0.07 1.735 1.541 2.697

Key Findings:

  • Strict prompting is critical: Adding "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, or commentary" improved parseable rate from 93% to 100%
  • Fine-tuning improves accuracy: Exact match increased from 9% (baseline) to 15% (fine-tuned), a 67% improvement
  • Quantization trade-offs: INT8 maintains accuracy (20% exact match, best across all models) with 50% memory reduction but shows 5x latency increase
  • Competitive with APIs: Fine-tuned model achieves 4x better exact match than GPT-4o-mini while maintaining comparable speed

Results Visualization

Model Comparison

Parseable SQL rate and exact match accuracy comparison across all 7 models.

How to load

Load the merged FP16 model (recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "aravula7/qwen-sql-finetuning"

tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
model = AutoModelForCausalLM.from_pretrained(
    repo_id, 
    subfolder="fp16",
    torch_dtype=torch.float16,
    device_map="auto"
)

Load the INT8 model

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "aravula7/qwen-sql-finetuning"

tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="int8")
model = AutoModelForCausalLM.from_pretrained(
    repo_id, 
    subfolder="int8",
    device_map="auto"
)

Load base model + LoRA adapter

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_id = "Qwen/Qwen2.5-3B-Instruct"
repo_id = "aravula7/qwen-sql-finetuning"

tokenizer = AutoTokenizer.from_pretrained(base_id)
base = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

model = PeftModel.from_pretrained(base, repo_id, subfolder="lora_adapter")

Example inference

Below is a minimal example that encourages SQL-only output (critical for 100% parseability).

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "aravula7/qwen-sql-finetuning"
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="fp16")
model = AutoModelForCausalLM.from_pretrained(
    repo_id, 
    subfolder="fp16",
    torch_dtype=torch.float16,
    device_map="auto"
)

system = "Return ONLY the PostgreSQL query. Do NOT include explanations, markdown, code fences, or commentary."
schema = "Table: customers (customer_id, email, state)\nTable: orders (order_id, customer_id, order_timestamp)"
request = "Show the number of orders per customer in 2025."

prompt = f"""{system}

Schema:
{schema}

Request:
{request}
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(
        **inputs, 
        max_new_tokens=256, 
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

sql = tokenizer.decode(out[0], skip_special_tokens=True)
# Extract SQL after prompt
sql = sql.split("Request:")[-1].strip()
print(sql)

License

This project is licensed under the MIT License. The fine-tuned model is a derivative of Qwen2.5-3B-Instruct and inherits its license terms.

Full documentation and code: GitHub Repository

Reproducibility

Training and evaluation were tracked with MLflow on DagsHub. The GitHub repository contains:

  • Complete Colab notebook with training and evaluation code
  • Dataset (500 examples: 350 train, 50 val, 100 test)
  • Visualization scripts for 3D performance analysis
  • Production-ready inference code with error handling

Links:

Citation

@misc{qwen-sql-finetuning-2026,
  author = {Anirudh Reddy Ravula},
  title = {Qwen2.5-3B Text-to-SQL Fine-Tuning for PostgreSQL},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/aravula7/qwen-sql-finetuning}},
  note = {Fine-tuned with QLoRA for e-commerce SQL generation}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aravula7/qwen-sql-finetuning

Base model

Qwen/Qwen2.5-3B
Finetuned
(1005)
this model