|
|
--- |
|
|
title: Business Category Description Generator |
|
|
emoji: π’ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: gradio |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
# Business Category Description Generator |
|
|
|
|
|
A Hugging Face Gradio application that generates CLIP-ready visual descriptions for business category keywords from CSV files. |
|
|
|
|
|
## Features |
|
|
|
|
|
- π€ **Upload Multiple CSV Files**: Process one or more CSV files at once |
|
|
- π **Batch Processing**: Automatically processes all unique categories from your files |
|
|
- π€ **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions |
|
|
- π **Automatic Retry Logic**: 3 attempts per category with intelligent error recovery |
|
|
- β
**Validation**: JSON validation and quality checks for every description |
|
|
- π **Progress Tracking**: Real-time progress updates with success/failure reporting |
|
|
- πΎ **Automatic Saving**: Output files with Status column showing results |
|
|
- π₯ **Easy Download**: Download all processed files directly from the interface |
|
|
- β‘ **Zero GPU Support**: Use Zero GPU for faster, free GPU acceleration |
|
|
|
|
|
## How to Use |
|
|
|
|
|
### 1. Deploy to Hugging Face Spaces |
|
|
|
|
|
1. Go to [Hugging Face Spaces](https://huggingface.co/spaces) |
|
|
2. Click "Create new Space" |
|
|
3. Choose "Gradio" as the SDK |
|
|
4. Upload `app.py`, `requirements.txt`, and `README.md` |
|
|
5. **Add Your HF Token as a Secret (Required)**: |
|
|
- Go to your Space's Settings (gear icon) |
|
|
- Find the "Repository secrets" or "Secrets" section |
|
|
- Click "Add a secret" or "New secret" |
|
|
- Enter: |
|
|
- **Name**: `HF_TOKEN` |
|
|
- **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens) |
|
|
- Click "Save" |
|
|
6. **Optional: Enable Zero GPU for Faster Processing**: |
|
|
- Zero GPU provides free GPU acceleration |
|
|
- No Pro subscription required |
|
|
- Space will automatically use GPU when available |
|
|
- Significantly speeds up processing for large batches |
|
|
7. Your app will be deployed and restart automatically! |
|
|
|
|
|
### 2. Prepare Your CSV Files |
|
|
|
|
|
Your CSV files should contain a column with business category keywords. For example: |
|
|
|
|
|
```csv |
|
|
category,other_column |
|
|
Car Rental For Self Driven,additional_data |
|
|
Mehandi,additional_data |
|
|
Photographer,additional_data |
|
|
Equipment,additional_data |
|
|
``` |
|
|
|
|
|
### 3. Use the Application |
|
|
|
|
|
1. **Upload Files**: Upload one or more CSV files |
|
|
2. **Specify Column**: Enter the name of the column containing categories (default: "category") |
|
|
3. **Adjust Settings** (optional): |
|
|
- Max Tokens: 64-512 (default: 256) |
|
|
- Temperature: 0.1-1.0 (default: 0.7) |
|
|
- Top-p: 0.1-1.0 (default: 0.9) |
|
|
4. **Process**: Click "Process Files" and wait for completion |
|
|
5. **Download**: Download the output CSV files with descriptions |
|
|
|
|
|
*Note: Authentication is handled automatically via the HF_TOKEN secret you configured in Space settings.* |
|
|
|
|
|
## Output Format |
|
|
|
|
|
Each output CSV file contains: |
|
|
|
|
|
| Column | Description | |
|
|
|--------|-------------| |
|
|
| `Category` | The original category keyword | |
|
|
| `Description` | The generated CLIP-ready visual description (validated) | |
|
|
| `Raw_Response` | The complete model response (for debugging) | |
|
|
| `Status` | "Success" or "Failed" with error details | |
|
|
|
|
|
## Example Output |
|
|
|
|
|
```csv |
|
|
Category,Description,Raw_Response,Status |
|
|
Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success |
|
|
``` |
|
|
|
|
|
## Model Settings |
|
|
|
|
|
- **Max Tokens**: Controls the maximum length of generated descriptions (default: 256) |
|
|
- **Temperature**: Controls output consistency (default: 0.3) |
|
|
- 0.2-0.4: Consistent, focused descriptions (recommended) |
|
|
- 0.5-0.7: Balanced creativity and consistency |
|
|
- 0.8-1.0: More creative variations |
|
|
- **Top-p**: Nucleus sampling parameter, controls diversity (default: 0.9) |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
- **Model**: openai/gpt-oss-20b |
|
|
- **Framework**: Gradio (latest stable version) |
|
|
- **Retry Logic**: 3 attempts per category with 1-second delay between retries |
|
|
- **Validation**: JSON parsing, structure validation, and minimum length checks |
|
|
- **Processing**: Categories are deduplicated automatically |
|
|
- **Rate Limiting**: 0.5-second delay between categories to avoid API throttling |
|
|
- **Output Files**: Named as `output_{original_name}_{timestamp}.csv` |
|
|
- **Zero GPU Support**: Free GPU acceleration available for Spaces |
|
|
|
|
|
## Troubleshooting |
|
|
|
|
|
### "HF_TOKEN not found" error |
|
|
- Make sure you've added `HF_TOKEN` as a Secret in your Space settings |
|
|
- Go to Space Settings β Secrets β Add a secret |
|
|
- Name must be exactly: `HF_TOKEN` (case-sensitive) |
|
|
- Value: your token from https://huggingface.co/settings/tokens |
|
|
- Restart your Space after adding the secret (or it will restart automatically) |
|
|
|
|
|
### "Column not found" error |
|
|
- Check that the column name matches exactly (case-sensitive) |
|
|
- View the error message to see available columns |
|
|
|
|
|
### Authentication errors |
|
|
- Ensure your HF token has proper permissions (Read access minimum) |
|
|
- Check that your account has access to the Inference API |
|
|
- Verify the token hasn't expired |
|
|
- Make sure you're using a valid token from https://huggingface.co/settings/tokens |
|
|
|
|
|
### Inconsistent or incomplete output |
|
|
- Lower the Temperature to 0.2-0.4 for more consistent results |
|
|
- Check the Status column in output CSV to identify failed categories |
|
|
- Failed categories can be extracted and reprocessed separately |
|
|
- Zero GPU will provide more reliable processing with better resources |
|
|
|
|
|
### Slow processing |
|
|
- The model processes each unique category individually (includes retries) |
|
|
- Large files with many unique categories will take longer |
|
|
- Consider splitting very large files into smaller batches |
|
|
- Zero GPU acceleration is automatically available for your Space |
|
|
- Each category has a 0.5s delay to prevent rate limiting |
|
|
|
|
|
## Local Development |
|
|
|
|
|
To run locally: |
|
|
|
|
|
```bash |
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Set your Hugging Face token as an environment variable |
|
|
# Windows (PowerShell): |
|
|
$env:HF_TOKEN="your_hf_token_here" |
|
|
|
|
|
# Linux/Mac: |
|
|
export HF_TOKEN="your_hf_token_here" |
|
|
|
|
|
# Run the app |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
Get your token from: https://huggingface.co/settings/tokens |
|
|
|
|
|
## License |
|
|
|
|
|
This project uses the GPT-OSS-20B model via Hugging Face Inference API. |
|
|
|
|
|
|