Model Card for GPT-2 Large Finetuned on IMDB Reviews

Model Details

Model Description

This model is a GPT-2 Large variant finetuned to generate English text reviews of movies based on a numeric rating input. Users provide a rating (1–8) as a prompt, and the model outputs a corresponding movie review. There are no strict limitations on the length of the generated text.

Developed by: Lev Ossadtchi
Model type: GPT-2 Large (transformer-based language model)
Language(s) (NLP): English
License: MIT
Finetuned from model: openai-community/gpt2-large

Model Sources

Repository: https://huggingface.co/levos06/gpt2-large-finetuned
Dataset used for finetuning: Stanford NLP IMDB dataset (link)

Uses

Direct Use

The model is intended to generate movie reviews based on a given rating. Users can input a numeric score and obtain a coherent, stylistically appropriate review in English. Ideal for content generation, data augmentation for NLP tasks, or demonstration purposes.

Downstream Use

The model can be used in applications requiring synthetic review generation for movies, such as testing recommendation systems, creating sample datasets, or educational tools to demonstrate natural language generation.

Out-of-Scope Use

The model is not intended for generating reviews of products, services, or non-movie content. It may produce unrealistic or biased outputs outside the domain of movie reviews.

Bias, Risks, and Limitations

The model is biased towards movie reviews and may not generate meaningful text for other domains.
The model may produce stereotypical or exaggerated opinions common in movie reviews.
Generated content should not be considered factual or reliable for real-world assessments.

Recommendations

Users should verify outputs if used for research or demonstration purposes. Avoid using the model for commercial or real-world evaluation of products other than movies.

How to Get Started with the Model

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("levos06/gpt2-large-finetuned")
tokenizer = GPT2Tokenizer.from_pretrained("levos06/gpt2-large-finetuned")

# Example usage:
prompt = "Rate: 8, Text:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=150)
review = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(review.split("Text:")[-1].strip())

Training Details

Training Data

Dataset: Stanford NLP IMDB dataset
Content: Movie reviews with ratings

Training Procedure

Precision: FP16
Gradient Accumulation: Yes
Epochs: 3
Base Model: GPT-2 Large

Evaluation

Metrics

The generated reviews were evaluated using an auxiliary model that predicts ratings from text, comparing predicted ratings with the input ratings. The mean absolute error (MAE) between input and predicted rating was calculated as a quality metric.

Limitations

The model is tailored to the IMDB movie dataset and does not generalize well to other domains.
Generated reviews may not always match the intended sentiment perfectly.

Environmental Impact

Training was conducted using GPU acceleration with FP16 precision to reduce energy consumption.

Hardware Type: GPU
Training Epochs: 3
Precision: FP16
Gradient Accumulation: Enabled

Citation

If you use this model, please cite it as:

@misc{levos06_gpt2_imdb,
  author = {Leva Osadchiy},
  title = {GPT-2 Large Finetuned on IMDB Reviews},
  year = {2025},
  howpublished = {Hugging Face Model Hub},
  url = {https://huggingface.co/levos06/gpt2-large-finetuned}
}

Downloads last month: 2

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for levos06/gpt2-large-finetuned

Base model

openai-community/gpt2-large

Finetuned

(127)

this model

levos06
/

gpt2-large-finetuned