Model Card for GPT-2 Large Finetuned on IMDB Reviews

Model Details

Model Description

This model is a GPT-2 Large variant finetuned to generate English text reviews of movies based on a numeric rating input. Users provide a rating (1–8) as a prompt, and the model outputs a corresponding movie review. There are no strict limitations on the length of the generated text.

  • Developed by: Lev Ossadtchi
  • Model type: GPT-2 Large (transformer-based language model)
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: openai-community/gpt2-large

Model Sources

Uses

Direct Use

The model is intended to generate movie reviews based on a given rating. Users can input a numeric score and obtain a coherent, stylistically appropriate review in English. Ideal for content generation, data augmentation for NLP tasks, or demonstration purposes.

Downstream Use

The model can be used in applications requiring synthetic review generation for movies, such as testing recommendation systems, creating sample datasets, or educational tools to demonstrate natural language generation.

Out-of-Scope Use

The model is not intended for generating reviews of products, services, or non-movie content. It may produce unrealistic or biased outputs outside the domain of movie reviews.

Bias, Risks, and Limitations

  • The model is biased towards movie reviews and may not generate meaningful text for other domains.
  • The model may produce stereotypical or exaggerated opinions common in movie reviews.
  • Generated content should not be considered factual or reliable for real-world assessments.

Recommendations

Users should verify outputs if used for research or demonstration purposes. Avoid using the model for commercial or real-world evaluation of products other than movies.

How to Get Started with the Model

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("levos06/gpt2-large-finetuned")
tokenizer = GPT2Tokenizer.from_pretrained("levos06/gpt2-large-finetuned")

# Example usage:
prompt = "Rate: 8, Text:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=150)
review = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(review.split("Text:")[-1].strip())

Training Details

Training Data

  • Dataset: Stanford NLP IMDB dataset
  • Content: Movie reviews with ratings

Training Procedure

  • Precision: FP16
  • Gradient Accumulation: Yes
  • Epochs: 3
  • Base Model: GPT-2 Large

Evaluation

Metrics

The generated reviews were evaluated using an auxiliary model that predicts ratings from text, comparing predicted ratings with the input ratings. The mean absolute error (MAE) between input and predicted rating was calculated as a quality metric.

Limitations

  • The model is tailored to the IMDB movie dataset and does not generalize well to other domains.
  • Generated reviews may not always match the intended sentiment perfectly.

Environmental Impact

Training was conducted using GPU acceleration with FP16 precision to reduce energy consumption.

  • Hardware Type: GPU
  • Training Epochs: 3
  • Precision: FP16
  • Gradient Accumulation: Enabled

Citation

If you use this model, please cite it as:

@misc{levos06_gpt2_imdb,
  author = {Leva Osadchiy},
  title = {GPT-2 Large Finetuned on IMDB Reviews},
  year = {2025},
  howpublished = {Hugging Face Model Hub},
  url = {https://huggingface.co/levos06/gpt2-large-finetuned}
}
Downloads last month
2
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for levos06/gpt2-large-finetuned

Finetuned
(127)
this model

Dataset used to train levos06/gpt2-large-finetuned