---
title: Business Category Description Generator
emoji: 🏢
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
---

# Business Category Description Generator

A Hugging Face Gradio application that generates CLIP-ready visual descriptions for business category keywords from CSV files.

## Features

- 📤 **Upload Multiple CSV Files**: Process one or more CSV files at once
- 🔄 **Batch Processing**: Automatically processes all unique categories from your files
- 🤖 **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
- 🔁 **Automatic Retry Logic**: 3 attempts per category with intelligent error recovery
- ✅ **Validation**: JSON validation and quality checks for every description
- 📊 **Progress Tracking**: Real-time progress updates with success/failure reporting
- 💾 **Automatic Saving**: Output files with Status column showing results
- 📥 **Easy Download**: Download all processed files directly from the interface
- ⚡ **Zero GPU Support**: Use Zero GPU for faster, free GPU acceleration

## How to Use

### 1. Deploy to Hugging Face Spaces

1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
2. Click "Create new Space"
3. Choose "Gradio" as the SDK
4. Upload `app.py`, `requirements.txt`, and `README.md`
5. **Add Your HF Token as a Secret (Required)**:
   - Go to your Space's Settings (gear icon)
   - Find the "Repository secrets" or "Secrets" section
   - Click "Add a secret" or "New secret"
   - Enter:
     - **Name**: `HF_TOKEN`
     - **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
   - Click "Save"
6. **Optional: Enable Zero GPU for Faster Processing**:
   - Zero GPU provides free GPU acceleration
   - No Pro subscription required
   - Space will automatically use GPU when available
   - Significantly speeds up processing for large batches
7. Your app will be deployed and restart automatically!

### 2. Prepare Your CSV Files

Your CSV files should contain a column with business category keywords. For example:

```csv
category,other_column
Car Rental For Self Driven,additional_data
Mehandi,additional_data
Photographer,additional_data
Equipment,additional_data
```

### 3. Use the Application

1. **Upload Files**: Upload one or more CSV files
2. **Specify Column**: Enter the name of the column containing categories (default: "category")
3. **Adjust Settings** (optional):
   - Max Tokens: 64-512 (default: 256)
   - Temperature: 0.1-1.0 (default: 0.7)
   - Top-p: 0.1-1.0 (default: 0.9)
4. **Process**: Click "Process Files" and wait for completion
5. **Download**: Download the output CSV files with descriptions

*Note: Authentication is handled automatically via the HF_TOKEN secret you configured in Space settings.*

## Output Format

Each output CSV file contains:

| Column | Description |
|--------|-------------|
| `Category` | The original category keyword |
| `Description` | The generated CLIP-ready visual description (validated) |
| `Raw_Response` | The complete model response (for debugging) |
| `Status` | "Success" or "Failed" with error details |

## Example Output

```csv
Category,Description,Raw_Response,Status
Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success
```

## Model Settings

- **Max Tokens**: Controls the maximum length of generated descriptions (default: 256)
- **Temperature**: Controls output consistency (default: 0.3)
  - 0.2-0.4: Consistent, focused descriptions (recommended)
  - 0.5-0.7: Balanced creativity and consistency
  - 0.8-1.0: More creative variations
- **Top-p**: Nucleus sampling parameter, controls diversity (default: 0.9)

## Technical Details

- **Model**: openai/gpt-oss-20b
- **Framework**: Gradio (latest stable version)
- **Retry Logic**: 3 attempts per category with 1-second delay between retries
- **Validation**: JSON parsing, structure validation, and minimum length checks
- **Processing**: Categories are deduplicated automatically
- **Rate Limiting**: 0.5-second delay between categories to avoid API throttling
- **Output Files**: Named as `output_{original_name}_{timestamp}.csv`
- **Zero GPU Support**: Free GPU acceleration available for Spaces

## Troubleshooting

### "HF_TOKEN not found" error
- Make sure you've added `HF_TOKEN` as a Secret in your Space settings
- Go to Space Settings → Secrets → Add a secret
- Name must be exactly: `HF_TOKEN` (case-sensitive)
- Value: your token from https://huggingface.co/settings/tokens
- Restart your Space after adding the secret (or it will restart automatically)

### "Column not found" error
- Check that the column name matches exactly (case-sensitive)
- View the error message to see available columns

### Authentication errors
- Ensure your HF token has proper permissions (Read access minimum)
- Check that your account has access to the Inference API
- Verify the token hasn't expired
- Make sure you're using a valid token from https://huggingface.co/settings/tokens

### Inconsistent or incomplete output
- Lower the Temperature to 0.2-0.4 for more consistent results
- Check the Status column in output CSV to identify failed categories
- Failed categories can be extracted and reprocessed separately
- Zero GPU will provide more reliable processing with better resources

### Slow processing
- The model processes each unique category individually (includes retries)
- Large files with many unique categories will take longer
- Consider splitting very large files into smaller batches
- Zero GPU acceleration is automatically available for your Space
- Each category has a 0.5s delay to prevent rate limiting

## Local Development

To run locally:

```bash
# Install dependencies
pip install -r requirements.txt

# Set your Hugging Face token as an environment variable
# Windows (PowerShell):
$env:HF_TOKEN="your_hf_token_here"

# Linux/Mac:
export HF_TOKEN="your_hf_token_here"

# Run the app
python app.py
```

Get your token from: https://huggingface.co/settings/tokens

## License

This project uses the GPT-OSS-20B model via Hugging Face Inference API.