gpt-oss / README.md
piyushdev's picture
Update README.md
7ba47b9 verified
---
title: Business Category Description Generator
emoji: 🏒
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
---
# Business Category Description Generator
A Hugging Face Gradio application that generates CLIP-ready visual descriptions for business category keywords from CSV files.
## Features
- πŸ“€ **Upload Multiple CSV Files**: Process one or more CSV files at once
- πŸ”„ **Batch Processing**: Automatically processes all unique categories from your files
- πŸ€– **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
- πŸ” **Automatic Retry Logic**: 3 attempts per category with intelligent error recovery
- βœ… **Validation**: JSON validation and quality checks for every description
- πŸ“Š **Progress Tracking**: Real-time progress updates with success/failure reporting
- πŸ’Ύ **Automatic Saving**: Output files with Status column showing results
- πŸ“₯ **Easy Download**: Download all processed files directly from the interface
- ⚑ **Zero GPU Support**: Use Zero GPU for faster, free GPU acceleration
## How to Use
### 1. Deploy to Hugging Face Spaces
1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
2. Click "Create new Space"
3. Choose "Gradio" as the SDK
4. Upload `app.py`, `requirements.txt`, and `README.md`
5. **Add Your HF Token as a Secret (Required)**:
- Go to your Space's Settings (gear icon)
- Find the "Repository secrets" or "Secrets" section
- Click "Add a secret" or "New secret"
- Enter:
- **Name**: `HF_TOKEN`
- **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
- Click "Save"
6. **Optional: Enable Zero GPU for Faster Processing**:
- Zero GPU provides free GPU acceleration
- No Pro subscription required
- Space will automatically use GPU when available
- Significantly speeds up processing for large batches
7. Your app will be deployed and restart automatically!
### 2. Prepare Your CSV Files
Your CSV files should contain a column with business category keywords. For example:
```csv
category,other_column
Car Rental For Self Driven,additional_data
Mehandi,additional_data
Photographer,additional_data
Equipment,additional_data
```
### 3. Use the Application
1. **Upload Files**: Upload one or more CSV files
2. **Specify Column**: Enter the name of the column containing categories (default: "category")
3. **Adjust Settings** (optional):
- Max Tokens: 64-512 (default: 256)
- Temperature: 0.1-1.0 (default: 0.7)
- Top-p: 0.1-1.0 (default: 0.9)
4. **Process**: Click "Process Files" and wait for completion
5. **Download**: Download the output CSV files with descriptions
*Note: Authentication is handled automatically via the HF_TOKEN secret you configured in Space settings.*
## Output Format
Each output CSV file contains:
| Column | Description |
|--------|-------------|
| `Category` | The original category keyword |
| `Description` | The generated CLIP-ready visual description (validated) |
| `Raw_Response` | The complete model response (for debugging) |
| `Status` | "Success" or "Failed" with error details |
## Example Output
```csv
Category,Description,Raw_Response,Status
Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success
```
## Model Settings
- **Max Tokens**: Controls the maximum length of generated descriptions (default: 256)
- **Temperature**: Controls output consistency (default: 0.3)
- 0.2-0.4: Consistent, focused descriptions (recommended)
- 0.5-0.7: Balanced creativity and consistency
- 0.8-1.0: More creative variations
- **Top-p**: Nucleus sampling parameter, controls diversity (default: 0.9)
## Technical Details
- **Model**: openai/gpt-oss-20b
- **Framework**: Gradio (latest stable version)
- **Retry Logic**: 3 attempts per category with 1-second delay between retries
- **Validation**: JSON parsing, structure validation, and minimum length checks
- **Processing**: Categories are deduplicated automatically
- **Rate Limiting**: 0.5-second delay between categories to avoid API throttling
- **Output Files**: Named as `output_{original_name}_{timestamp}.csv`
- **Zero GPU Support**: Free GPU acceleration available for Spaces
## Troubleshooting
### "HF_TOKEN not found" error
- Make sure you've added `HF_TOKEN` as a Secret in your Space settings
- Go to Space Settings β†’ Secrets β†’ Add a secret
- Name must be exactly: `HF_TOKEN` (case-sensitive)
- Value: your token from https://huggingface.co/settings/tokens
- Restart your Space after adding the secret (or it will restart automatically)
### "Column not found" error
- Check that the column name matches exactly (case-sensitive)
- View the error message to see available columns
### Authentication errors
- Ensure your HF token has proper permissions (Read access minimum)
- Check that your account has access to the Inference API
- Verify the token hasn't expired
- Make sure you're using a valid token from https://huggingface.co/settings/tokens
### Inconsistent or incomplete output
- Lower the Temperature to 0.2-0.4 for more consistent results
- Check the Status column in output CSV to identify failed categories
- Failed categories can be extracted and reprocessed separately
- Zero GPU will provide more reliable processing with better resources
### Slow processing
- The model processes each unique category individually (includes retries)
- Large files with many unique categories will take longer
- Consider splitting very large files into smaller batches
- Zero GPU acceleration is automatically available for your Space
- Each category has a 0.5s delay to prevent rate limiting
## Local Development
To run locally:
```bash
# Install dependencies
pip install -r requirements.txt
# Set your Hugging Face token as an environment variable
# Windows (PowerShell):
$env:HF_TOKEN="your_hf_token_here"
# Linux/Mac:
export HF_TOKEN="your_hf_token_here"
# Run the app
python app.py
```
Get your token from: https://huggingface.co/settings/tokens
## License
This project uses the GPT-OSS-20B model via Hugging Face Inference API.