CustomGPT

Model Summary

CustomGPT is an LLM which is built, train, instruction-finetuned from scratch and evaluated using the LLM-as-a-judge method. This project shows my learning about developing a custom LLM architecture from scratch and its deployment on huggingface. It should be noted that this model is not to be used in production as it only for demo purpose which showcases my learning of LLM engineering. GPT pretrained weights have been used which are further fine-tuned on a small instruction dataset.

This model is fully compatible with the Hugging Face transformers ecosystem and can be loaded using AutoModel.from_pretrained.

How to Get Started with the Model

Inference Example (Transformers + tiktoken)

from transformers import AutoModel
import tiktoken

# Load tokenizer
tokenizer = tiktoken.get_encoding("gpt2")

# Load model
model_id = "FarhanAK128/CustomGPT"
model = AutoModel.from_pretrained(
    model_id,
    trust_remote_code=True
)

# Example prediction
input = {'instruction': 'Rewrite the sentence using a simile.',
         'input': 'The car is very fast.'
        }

response = model.generate_response(input, tokenizer)
print(response) # The car is as fast as a cheetah.

Note: This model uses a custom .generate() method defined in the repository and requires trust_remote_code=True to function.

Model Details

📝 Model Description

Developed by: Farhan Ali Khan
Model type: GPT-2–based text generation model
Base architecture: GPT-2 (OpenAI)
Framework: PyTorch
Task: Text Generation
Language: English
License: MIT

Training Details

Training Data

The model was trained on a small instruction dataset having 1100 input-output pairs

Training Procedure

Base weights: OpenAI GPT-2 (355 million parameters)
Fine-tuning strategy: Full fine-tuning
Optimizer: AdamW
Learning rate: 0.00005
Weight decay: 0.1
Epochs: 3
Random seed: 123
Loss function: Cross-Entropy Loss
Training Strategy: Mixprecision
Total training time: ~4.82 minutes

📈 Training Progress

Training and Validation Loss

📊 Model Performance

The 355M custom LLM is evaluated using LLama-3.1-8b-instant as an automated judge. For each input, the model’s response is compared to the ground-truth output, and the judge assigns a score from 0 to 100 based on correctness. Scores are extracted as integers and aggregated to report the average performance across the test dataset which comes out to be 44%.

Model Card Authors

Farhan Ali Khan

Model Card Contact

For questions or feedback, please reach out via my Hugging Face profile: FarhanAK128

Downloads last month: 25

Safetensors

Model size

0.4B params

Tensor type

F32