Model Card for GPT-2 Large Finetuned on IMDB Reviews
Model Details
Model Description
This model is a GPT-2 Large variant finetuned to generate English text reviews of movies based on a numeric rating input. Users provide a rating (1–8) as a prompt, and the model outputs a corresponding movie review. There are no strict limitations on the length of the generated text.
- Developed by: Lev Ossadtchi
- Model type: GPT-2 Large (transformer-based language model)
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: openai-community/gpt2-large
Model Sources
- Repository: https://huggingface.co/levos06/gpt2-large-finetuned
- Dataset used for finetuning: Stanford NLP IMDB dataset (link)
Uses
Direct Use
The model is intended to generate movie reviews based on a given rating. Users can input a numeric score and obtain a coherent, stylistically appropriate review in English. Ideal for content generation, data augmentation for NLP tasks, or demonstration purposes.
Downstream Use
The model can be used in applications requiring synthetic review generation for movies, such as testing recommendation systems, creating sample datasets, or educational tools to demonstrate natural language generation.
Out-of-Scope Use
The model is not intended for generating reviews of products, services, or non-movie content. It may produce unrealistic or biased outputs outside the domain of movie reviews.
Bias, Risks, and Limitations
- The model is biased towards movie reviews and may not generate meaningful text for other domains.
- The model may produce stereotypical or exaggerated opinions common in movie reviews.
- Generated content should not be considered factual or reliable for real-world assessments.
Recommendations
Users should verify outputs if used for research or demonstration purposes. Avoid using the model for commercial or real-world evaluation of products other than movies.
How to Get Started with the Model
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model = GPT2LMHeadModel.from_pretrained("levos06/gpt2-large-finetuned")
tokenizer = GPT2Tokenizer.from_pretrained("levos06/gpt2-large-finetuned")
# Example usage:
prompt = "Rate: 8, Text:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=150)
review = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(review.split("Text:")[-1].strip())
Training Details
Training Data
- Dataset: Stanford NLP IMDB dataset
- Content: Movie reviews with ratings
Training Procedure
- Precision: FP16
- Gradient Accumulation: Yes
- Epochs: 3
- Base Model: GPT-2 Large
Evaluation
Metrics
The generated reviews were evaluated using an auxiliary model that predicts ratings from text, comparing predicted ratings with the input ratings. The mean absolute error (MAE) between input and predicted rating was calculated as a quality metric.
Limitations
- The model is tailored to the IMDB movie dataset and does not generalize well to other domains.
- Generated reviews may not always match the intended sentiment perfectly.
Environmental Impact
Training was conducted using GPU acceleration with FP16 precision to reduce energy consumption.
- Hardware Type: GPU
- Training Epochs: 3
- Precision: FP16
- Gradient Accumulation: Enabled
Citation
If you use this model, please cite it as:
@misc{levos06_gpt2_imdb,
author = {Leva Osadchiy},
title = {GPT-2 Large Finetuned on IMDB Reviews},
year = {2025},
howpublished = {Hugging Face Model Hub},
url = {https://huggingface.co/levos06/gpt2-large-finetuned}
}
- Downloads last month
- 2
Model tree for levos06/gpt2-large-finetuned
Base model
openai-community/gpt2-large