gpt-oss / README.md
piyushdev's picture
Update README.md
7ba47b9 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Business Category Description Generator
emoji: 🏒
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false

Business Category Description Generator

A Hugging Face Gradio application that generates CLIP-ready visual descriptions for business category keywords from CSV files.

Features

  • πŸ“€ Upload Multiple CSV Files: Process one or more CSV files at once
  • πŸ”„ Batch Processing: Automatically processes all unique categories from your files
  • πŸ€– AI-Powered: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
  • πŸ” Automatic Retry Logic: 3 attempts per category with intelligent error recovery
  • βœ… Validation: JSON validation and quality checks for every description
  • πŸ“Š Progress Tracking: Real-time progress updates with success/failure reporting
  • πŸ’Ύ Automatic Saving: Output files with Status column showing results
  • πŸ“₯ Easy Download: Download all processed files directly from the interface
  • ⚑ Zero GPU Support: Use Zero GPU for faster, free GPU acceleration

How to Use

1. Deploy to Hugging Face Spaces

  1. Go to Hugging Face Spaces
  2. Click "Create new Space"
  3. Choose "Gradio" as the SDK
  4. Upload app.py, requirements.txt, and README.md
  5. Add Your HF Token as a Secret (Required):
    • Go to your Space's Settings (gear icon)
    • Find the "Repository secrets" or "Secrets" section
    • Click "Add a secret" or "New secret"
    • Enter:
    • Click "Save"
  6. Optional: Enable Zero GPU for Faster Processing:
    • Zero GPU provides free GPU acceleration
    • No Pro subscription required
    • Space will automatically use GPU when available
    • Significantly speeds up processing for large batches
  7. Your app will be deployed and restart automatically!

2. Prepare Your CSV Files

Your CSV files should contain a column with business category keywords. For example:

category,other_column
Car Rental For Self Driven,additional_data
Mehandi,additional_data
Photographer,additional_data
Equipment,additional_data

3. Use the Application

  1. Upload Files: Upload one or more CSV files
  2. Specify Column: Enter the name of the column containing categories (default: "category")
  3. Adjust Settings (optional):
    • Max Tokens: 64-512 (default: 256)
    • Temperature: 0.1-1.0 (default: 0.7)
    • Top-p: 0.1-1.0 (default: 0.9)
  4. Process: Click "Process Files" and wait for completion
  5. Download: Download the output CSV files with descriptions

Note: Authentication is handled automatically via the HF_TOKEN secret you configured in Space settings.

Output Format

Each output CSV file contains:

Column Description
Category The original category keyword
Description The generated CLIP-ready visual description (validated)
Raw_Response The complete model response (for debugging)
Status "Success" or "Failed" with error details

Example Output

Category,Description,Raw_Response,Status
Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success

Model Settings

  • Max Tokens: Controls the maximum length of generated descriptions (default: 256)
  • Temperature: Controls output consistency (default: 0.3)
    • 0.2-0.4: Consistent, focused descriptions (recommended)
    • 0.5-0.7: Balanced creativity and consistency
    • 0.8-1.0: More creative variations
  • Top-p: Nucleus sampling parameter, controls diversity (default: 0.9)

Technical Details

  • Model: openai/gpt-oss-20b
  • Framework: Gradio (latest stable version)
  • Retry Logic: 3 attempts per category with 1-second delay between retries
  • Validation: JSON parsing, structure validation, and minimum length checks
  • Processing: Categories are deduplicated automatically
  • Rate Limiting: 0.5-second delay between categories to avoid API throttling
  • Output Files: Named as output_{original_name}_{timestamp}.csv
  • Zero GPU Support: Free GPU acceleration available for Spaces

Troubleshooting

"HF_TOKEN not found" error

  • Make sure you've added HF_TOKEN as a Secret in your Space settings
  • Go to Space Settings β†’ Secrets β†’ Add a secret
  • Name must be exactly: HF_TOKEN (case-sensitive)
  • Value: your token from https://huggingface.co/settings/tokens
  • Restart your Space after adding the secret (or it will restart automatically)

"Column not found" error

  • Check that the column name matches exactly (case-sensitive)
  • View the error message to see available columns

Authentication errors

  • Ensure your HF token has proper permissions (Read access minimum)
  • Check that your account has access to the Inference API
  • Verify the token hasn't expired
  • Make sure you're using a valid token from https://huggingface.co/settings/tokens

Inconsistent or incomplete output

  • Lower the Temperature to 0.2-0.4 for more consistent results
  • Check the Status column in output CSV to identify failed categories
  • Failed categories can be extracted and reprocessed separately
  • Zero GPU will provide more reliable processing with better resources

Slow processing

  • The model processes each unique category individually (includes retries)
  • Large files with many unique categories will take longer
  • Consider splitting very large files into smaller batches
  • Zero GPU acceleration is automatically available for your Space
  • Each category has a 0.5s delay to prevent rate limiting

Local Development

To run locally:

# Install dependencies
pip install -r requirements.txt

# Set your Hugging Face token as an environment variable
# Windows (PowerShell):
$env:HF_TOKEN="your_hf_token_here"

# Linux/Mac:
export HF_TOKEN="your_hf_token_here"

# Run the app
python app.py

Get your token from: https://huggingface.co/settings/tokens

License

This project uses the GPT-OSS-20B model via Hugging Face Inference API.