Spaces:

piyushdev
/

gpt-oss

Sleeping

App Files Files Community

gpt-oss / README.md

piyushdev

Update README.md

7ba47b9 verified about 1 month ago

preview code

raw

history blame contribute delete

6.27 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Business Category Description Generator
emoji: 🏢
colorFrom: blue
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false

Business Category Description Generator

A Hugging Face Gradio application that generates CLIP-ready visual descriptions for business category keywords from CSV files.

Features

📤 Upload Multiple CSV Files: Process one or more CSV files at once
🔄 Batch Processing: Automatically processes all unique categories from your files
🤖 AI-Powered: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
🔁 Automatic Retry Logic: 3 attempts per category with intelligent error recovery
✅ Validation: JSON validation and quality checks for every description
📊 Progress Tracking: Real-time progress updates with success/failure reporting
💾 Automatic Saving: Output files with Status column showing results
📥 Easy Download: Download all processed files directly from the interface
⚡ Zero GPU Support: Use Zero GPU for faster, free GPU acceleration

How to Use

1. Deploy to Hugging Face Spaces

Go to Hugging Face Spaces
Click "Create new Space"
Choose "Gradio" as the SDK
Upload app.py, requirements.txt, and README.md
Add Your HF Token as a Secret (Required):
- Go to your Space's Settings (gear icon)
- Find the "Repository secrets" or "Secrets" section
- Click "Add a secret" or "New secret"
- Enter:
  - Name: HF_TOKEN
  - Value: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
- Click "Save"
Optional: Enable Zero GPU for Faster Processing:
- Zero GPU provides free GPU acceleration
- No Pro subscription required
- Space will automatically use GPU when available
- Significantly speeds up processing for large batches
Your app will be deployed and restart automatically!

2. Prepare Your CSV Files

Your CSV files should contain a column with business category keywords. For example:

category,other_column
Car Rental For Self Driven,additional_data
Mehandi,additional_data
Photographer,additional_data
Equipment,additional_data

3. Use the Application

Upload Files: Upload one or more CSV files
Specify Column: Enter the name of the column containing categories (default: "category")
Adjust Settings (optional):
- Max Tokens: 64-512 (default: 256)
- Temperature: 0.1-1.0 (default: 0.7)
- Top-p: 0.1-1.0 (default: 0.9)
Process: Click "Process Files" and wait for completion
Download: Download the output CSV files with descriptions

Note: Authentication is handled automatically via the HF_TOKEN secret you configured in Space settings.

Output Format

Each output CSV file contains:

Column	Description
`Category`	The original category keyword
`Description`	The generated CLIP-ready visual description (validated)
`Raw_Response`	The complete model response (for debugging)
`Status`	"Success" or "Failed" with error details

Example Output

Category,Description,Raw_Response,Status
Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success

Model Settings

Max Tokens: Controls the maximum length of generated descriptions (default: 256)
Temperature: Controls output consistency (default: 0.3)
- 0.2-0.4: Consistent, focused descriptions (recommended)
- 0.5-0.7: Balanced creativity and consistency
- 0.8-1.0: More creative variations
Top-p: Nucleus sampling parameter, controls diversity (default: 0.9)

Technical Details

Model: openai/gpt-oss-20b
Framework: Gradio (latest stable version)
Retry Logic: 3 attempts per category with 1-second delay between retries
Validation: JSON parsing, structure validation, and minimum length checks
Processing: Categories are deduplicated automatically
Rate Limiting: 0.5-second delay between categories to avoid API throttling
Output Files: Named as output_{original_name}_{timestamp}.csv
Zero GPU Support: Free GPU acceleration available for Spaces

Troubleshooting

"HF_TOKEN not found" error

Make sure you've added HF_TOKEN as a Secret in your Space settings
Go to Space Settings → Secrets → Add a secret
Name must be exactly: HF_TOKEN (case-sensitive)
Value: your token from https://huggingface.co/settings/tokens
Restart your Space after adding the secret (or it will restart automatically)

"Column not found" error

Check that the column name matches exactly (case-sensitive)
View the error message to see available columns

Authentication errors

Ensure your HF token has proper permissions (Read access minimum)
Check that your account has access to the Inference API
Verify the token hasn't expired
Make sure you're using a valid token from https://huggingface.co/settings/tokens

Inconsistent or incomplete output

Lower the Temperature to 0.2-0.4 for more consistent results
Check the Status column in output CSV to identify failed categories
Failed categories can be extracted and reprocessed separately
Zero GPU will provide more reliable processing with better resources

Slow processing

The model processes each unique category individually (includes retries)
Large files with many unique categories will take longer
Consider splitting very large files into smaller batches
Zero GPU acceleration is automatically available for your Space
Each category has a 0.5s delay to prevent rate limiting

Local Development

To run locally:

# Install dependencies
pip install -r requirements.txt

# Set your Hugging Face token as an environment variable
# Windows (PowerShell):
$env:HF_TOKEN="your_hf_token_here"

# Linux/Mac:
export HF_TOKEN="your_hf_token_here"

# Run the app
python app.py

Get your token from: https://huggingface.co/settings/tokens

License

This project uses the GPT-OSS-20B model via Hugging Face Inference API.