Spaces:

piyushdev
/

gpt-oss

Sleeping

App Files Files Community

gpt-oss / README.md

piyushdev

Update README.md

7ba47b9 verified about 2 months ago

preview code

raw

history blame contribute delete

6.27 kB

	---
	title: Business Category Description Generator
	emoji: 🏢
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	app_file: app.py
	pinned: false
	---

	# Business Category Description Generator

	A Hugging Face Gradio application that generates CLIP-ready visual descriptions for business category keywords from CSV files.

	## Features

	- 📤 Upload Multiple CSV Files: Process one or more CSV files at once
	- 🔄 Batch Processing: Automatically processes all unique categories from your files
	- 🤖 AI-Powered: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
	- 🔁 Automatic Retry Logic: 3 attempts per category with intelligent error recovery
	- ✅ Validation: JSON validation and quality checks for every description
	- 📊 Progress Tracking: Real-time progress updates with success/failure reporting
	- 💾 Automatic Saving: Output files with Status column showing results
	- 📥 Easy Download: Download all processed files directly from the interface
	- ⚡ Zero GPU Support: Use Zero GPU for faster, free GPU acceleration

	## How to Use

	### 1. Deploy to Hugging Face Spaces

	1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
	2. Click "Create new Space"
	3. Choose "Gradio" as the SDK
	4. Upload `app.py`, `requirements.txt`, and `README.md`
	5. Add Your HF Token as a Secret (Required):
	- Go to your Space's Settings (gear icon)
	- Find the "Repository secrets" or "Secrets" section
	- Click "Add a secret" or "New secret"
	- Enter:
	- Name: `HF_TOKEN`
	- Value: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
	- Click "Save"
	6. Optional: Enable Zero GPU for Faster Processing:
	- Zero GPU provides free GPU acceleration
	- No Pro subscription required
	- Space will automatically use GPU when available
	- Significantly speeds up processing for large batches
	7. Your app will be deployed and restart automatically!

	### 2. Prepare Your CSV Files

	Your CSV files should contain a column with business category keywords. For example:

	```csv
	category,other_column
	Car Rental For Self Driven,additional_data
	Mehandi,additional_data
	Photographer,additional_data
	Equipment,additional_data
	```

	### 3. Use the Application

	1. Upload Files: Upload one or more CSV files
	2. Specify Column: Enter the name of the column containing categories (default: "category")
	3. Adjust Settings (optional):
	- Max Tokens: 64-512 (default: 256)
	- Temperature: 0.1-1.0 (default: 0.7)
	- Top-p: 0.1-1.0 (default: 0.9)
	4. Process: Click "Process Files" and wait for completion
	5. Download: Download the output CSV files with descriptions

	Note: Authentication is handled automatically via the HF_TOKEN secret you configured in Space settings.

	## Output Format

	Each output CSV file contains:

	\| Column \| Description \|
	\|--------\|-------------\|
	\| `Category` \| The original category keyword \|
	\| `Description` \| The generated CLIP-ready visual description (validated) \|
	\| `Raw_Response` \| The complete model response (for debugging) \|
	\| `Status` \| "Success" or "Failed" with error details \|

	## Example Output

	```csv
	Category,Description,Raw_Response,Status
	Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success
	```

	## Model Settings

	- Max Tokens: Controls the maximum length of generated descriptions (default: 256)
	- Temperature: Controls output consistency (default: 0.3)
	- 0.2-0.4: Consistent, focused descriptions (recommended)
	- 0.5-0.7: Balanced creativity and consistency
	- 0.8-1.0: More creative variations
	- Top-p: Nucleus sampling parameter, controls diversity (default: 0.9)

	## Technical Details

	- Model: openai/gpt-oss-20b
	- Framework: Gradio (latest stable version)
	- Retry Logic: 3 attempts per category with 1-second delay between retries
	- Validation: JSON parsing, structure validation, and minimum length checks
	- Processing: Categories are deduplicated automatically
	- Rate Limiting: 0.5-second delay between categories to avoid API throttling
	- Output Files: Named as `output_{original_name}_{timestamp}.csv`
	- Zero GPU Support: Free GPU acceleration available for Spaces

	## Troubleshooting

	### "HF_TOKEN not found" error
	- Make sure you've added `HF_TOKEN` as a Secret in your Space settings
	- Go to Space Settings → Secrets → Add a secret
	- Name must be exactly: `HF_TOKEN` (case-sensitive)
	- Value: your token from https://huggingface.co/settings/tokens
	- Restart your Space after adding the secret (or it will restart automatically)

	### "Column not found" error
	- Check that the column name matches exactly (case-sensitive)
	- View the error message to see available columns

	### Authentication errors
	- Ensure your HF token has proper permissions (Read access minimum)
	- Check that your account has access to the Inference API
	- Verify the token hasn't expired
	- Make sure you're using a valid token from https://huggingface.co/settings/tokens

	### Inconsistent or incomplete output
	- Lower the Temperature to 0.2-0.4 for more consistent results
	- Check the Status column in output CSV to identify failed categories
	- Failed categories can be extracted and reprocessed separately
	- Zero GPU will provide more reliable processing with better resources

	### Slow processing
	- The model processes each unique category individually (includes retries)
	- Large files with many unique categories will take longer
	- Consider splitting very large files into smaller batches
	- Zero GPU acceleration is automatically available for your Space
	- Each category has a 0.5s delay to prevent rate limiting

	## Local Development

	To run locally:

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Set your Hugging Face token as an environment variable
	# Windows (PowerShell):
	$env:HF_TOKEN="your_hf_token_here"

	# Linux/Mac:
	export HF_TOKEN="your_hf_token_here"

	# Run the app
	python app.py
	```

	Get your token from: https://huggingface.co/settings/tokens

	## License

	This project uses the GPT-OSS-20B model via Hugging Face Inference API.