--- title: Business Category Description Generator emoji: 🏢 colorFrom: blue colorTo: purple sdk: gradio app_file: app.py pinned: false --- # Business Category Description Generator A Hugging Face Gradio application that generates CLIP-ready visual descriptions for business category keywords from CSV files. ## Features - 📤 **Upload Multiple CSV Files**: Process one or more CSV files at once - 🔄 **Batch Processing**: Automatically processes all unique categories from your files - 🤖 **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions - 🔁 **Automatic Retry Logic**: 3 attempts per category with intelligent error recovery - ✅ **Validation**: JSON validation and quality checks for every description - 📊 **Progress Tracking**: Real-time progress updates with success/failure reporting - 💾 **Automatic Saving**: Output files with Status column showing results - 📥 **Easy Download**: Download all processed files directly from the interface - ⚡ **Zero GPU Support**: Use Zero GPU for faster, free GPU acceleration ## How to Use ### 1. Deploy to Hugging Face Spaces 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces) 2. Click "Create new Space" 3. Choose "Gradio" as the SDK 4. Upload `app.py`, `requirements.txt`, and `README.md` 5. **Add Your HF Token as a Secret (Required)**: - Go to your Space's Settings (gear icon) - Find the "Repository secrets" or "Secrets" section - Click "Add a secret" or "New secret" - Enter: - **Name**: `HF_TOKEN` - **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens) - Click "Save" 6. **Optional: Enable Zero GPU for Faster Processing**: - Zero GPU provides free GPU acceleration - No Pro subscription required - Space will automatically use GPU when available - Significantly speeds up processing for large batches 7. Your app will be deployed and restart automatically! ### 2. Prepare Your CSV Files Your CSV files should contain a column with business category keywords. For example: ```csv category,other_column Car Rental For Self Driven,additional_data Mehandi,additional_data Photographer,additional_data Equipment,additional_data ``` ### 3. Use the Application 1. **Upload Files**: Upload one or more CSV files 2. **Specify Column**: Enter the name of the column containing categories (default: "category") 3. **Adjust Settings** (optional): - Max Tokens: 64-512 (default: 256) - Temperature: 0.1-1.0 (default: 0.7) - Top-p: 0.1-1.0 (default: 0.9) 4. **Process**: Click "Process Files" and wait for completion 5. **Download**: Download the output CSV files with descriptions *Note: Authentication is handled automatically via the HF_TOKEN secret you configured in Space settings.* ## Output Format Each output CSV file contains: | Column | Description | |--------|-------------| | `Category` | The original category keyword | | `Description` | The generated CLIP-ready visual description (validated) | | `Raw_Response` | The complete model response (for debugging) | | `Status` | "Success" or "Failed" with error details | ## Example Output ```csv Category,Description,Raw_Response,Status Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success ``` ## Model Settings - **Max Tokens**: Controls the maximum length of generated descriptions (default: 256) - **Temperature**: Controls output consistency (default: 0.3) - 0.2-0.4: Consistent, focused descriptions (recommended) - 0.5-0.7: Balanced creativity and consistency - 0.8-1.0: More creative variations - **Top-p**: Nucleus sampling parameter, controls diversity (default: 0.9) ## Technical Details - **Model**: openai/gpt-oss-20b - **Framework**: Gradio (latest stable version) - **Retry Logic**: 3 attempts per category with 1-second delay between retries - **Validation**: JSON parsing, structure validation, and minimum length checks - **Processing**: Categories are deduplicated automatically - **Rate Limiting**: 0.5-second delay between categories to avoid API throttling - **Output Files**: Named as `output_{original_name}_{timestamp}.csv` - **Zero GPU Support**: Free GPU acceleration available for Spaces ## Troubleshooting ### "HF_TOKEN not found" error - Make sure you've added `HF_TOKEN` as a Secret in your Space settings - Go to Space Settings → Secrets → Add a secret - Name must be exactly: `HF_TOKEN` (case-sensitive) - Value: your token from https://huggingface.co/settings/tokens - Restart your Space after adding the secret (or it will restart automatically) ### "Column not found" error - Check that the column name matches exactly (case-sensitive) - View the error message to see available columns ### Authentication errors - Ensure your HF token has proper permissions (Read access minimum) - Check that your account has access to the Inference API - Verify the token hasn't expired - Make sure you're using a valid token from https://huggingface.co/settings/tokens ### Inconsistent or incomplete output - Lower the Temperature to 0.2-0.4 for more consistent results - Check the Status column in output CSV to identify failed categories - Failed categories can be extracted and reprocessed separately - Zero GPU will provide more reliable processing with better resources ### Slow processing - The model processes each unique category individually (includes retries) - Large files with many unique categories will take longer - Consider splitting very large files into smaller batches - Zero GPU acceleration is automatically available for your Space - Each category has a 0.5s delay to prevent rate limiting ## Local Development To run locally: ```bash # Install dependencies pip install -r requirements.txt # Set your Hugging Face token as an environment variable # Windows (PowerShell): $env:HF_TOKEN="your_hf_token_here" # Linux/Mac: export HF_TOKEN="your_hf_token_here" # Run the app python app.py ``` Get your token from: https://huggingface.co/settings/tokens ## License This project uses the GPT-OSS-20B model via Hugging Face Inference API.