Update README.md
Browse files
README.md
CHANGED
|
@@ -17,9 +17,12 @@ A Hugging Face Gradio application that generates CLIP-ready visual descriptions
|
|
| 17 |
- π€ **Upload Multiple CSV Files**: Process one or more CSV files at once
|
| 18 |
- π **Batch Processing**: Automatically processes all unique categories from your files
|
| 19 |
- π€ **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
|
| 20 |
-
-
|
| 21 |
-
-
|
|
|
|
|
|
|
| 22 |
- π₯ **Easy Download**: Download all processed files directly from the interface
|
|
|
|
| 23 |
|
| 24 |
## How to Use
|
| 25 |
|
|
@@ -37,7 +40,12 @@ A Hugging Face Gradio application that generates CLIP-ready visual descriptions
|
|
| 37 |
- **Name**: `HF_TOKEN`
|
| 38 |
- **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
|
| 39 |
- Click "Save"
|
| 40 |
-
6.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
### 2. Prepare Your CSV Files
|
| 43 |
|
|
@@ -71,28 +79,36 @@ Each output CSV file contains:
|
|
| 71 |
| Column | Description |
|
| 72 |
|--------|-------------|
|
| 73 |
| `Category` | The original category keyword |
|
| 74 |
-
| `Description` | The generated CLIP-ready visual description |
|
| 75 |
-
| `Raw_Response` | The complete model response (
|
|
|
|
| 76 |
|
| 77 |
## Example Output
|
| 78 |
|
| 79 |
```csv
|
| 80 |
-
Category,Description,Raw_Response
|
| 81 |
-
Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}"
|
| 82 |
```
|
| 83 |
|
| 84 |
## Model Settings
|
| 85 |
|
| 86 |
-
- **Max Tokens**: Controls the maximum length of generated descriptions
|
| 87 |
-
- **Temperature**:
|
| 88 |
-
-
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
## Technical Details
|
| 91 |
|
| 92 |
- **Model**: openai/gpt-oss-20b
|
| 93 |
- **Framework**: Gradio (latest stable version)
|
|
|
|
|
|
|
| 94 |
- **Processing**: Categories are deduplicated automatically
|
|
|
|
| 95 |
- **Output Files**: Named as `output_{original_name}_{timestamp}.csv`
|
|
|
|
| 96 |
|
| 97 |
## Troubleshooting
|
| 98 |
|
|
@@ -113,10 +129,18 @@ Car Rental For Self Driven,"a car available for self-drive rental, parked at a p
|
|
| 113 |
- Verify the token hasn't expired
|
| 114 |
- Make sure you're using a valid token from https://huggingface.co/settings/tokens
|
| 115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
### Slow processing
|
| 117 |
-
- The model processes each unique category individually
|
| 118 |
- Large files with many unique categories will take longer
|
| 119 |
- Consider splitting very large files into smaller batches
|
|
|
|
|
|
|
| 120 |
|
| 121 |
## Local Development
|
| 122 |
|
|
|
|
| 17 |
- π€ **Upload Multiple CSV Files**: Process one or more CSV files at once
|
| 18 |
- π **Batch Processing**: Automatically processes all unique categories from your files
|
| 19 |
- π€ **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
|
| 20 |
+
- π **Automatic Retry Logic**: 3 attempts per category with intelligent error recovery
|
| 21 |
+
- β
**Validation**: JSON validation and quality checks for every description
|
| 22 |
+
- π **Progress Tracking**: Real-time progress updates with success/failure reporting
|
| 23 |
+
- πΎ **Automatic Saving**: Output files with Status column showing results
|
| 24 |
- π₯ **Easy Download**: Download all processed files directly from the interface
|
| 25 |
+
- β‘ **Zero GPU Support**: Use Zero GPU for faster, free GPU acceleration
|
| 26 |
|
| 27 |
## How to Use
|
| 28 |
|
|
|
|
| 40 |
- **Name**: `HF_TOKEN`
|
| 41 |
- **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
|
| 42 |
- Click "Save"
|
| 43 |
+
6. **Optional: Enable Zero GPU for Faster Processing**:
|
| 44 |
+
- Zero GPU provides free GPU acceleration
|
| 45 |
+
- No Pro subscription required
|
| 46 |
+
- Space will automatically use GPU when available
|
| 47 |
+
- Significantly speeds up processing for large batches
|
| 48 |
+
7. Your app will be deployed and restart automatically!
|
| 49 |
|
| 50 |
### 2. Prepare Your CSV Files
|
| 51 |
|
|
|
|
| 79 |
| Column | Description |
|
| 80 |
|--------|-------------|
|
| 81 |
| `Category` | The original category keyword |
|
| 82 |
+
| `Description` | The generated CLIP-ready visual description (validated) |
|
| 83 |
+
| `Raw_Response` | The complete model response (for debugging) |
|
| 84 |
+
| `Status` | "Success" or "Failed" with error details |
|
| 85 |
|
| 86 |
## Example Output
|
| 87 |
|
| 88 |
```csv
|
| 89 |
+
Category,Description,Raw_Response,Status
|
| 90 |
+
Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success
|
| 91 |
```
|
| 92 |
|
| 93 |
## Model Settings
|
| 94 |
|
| 95 |
+
- **Max Tokens**: Controls the maximum length of generated descriptions (default: 256)
|
| 96 |
+
- **Temperature**: Controls output consistency (default: 0.3)
|
| 97 |
+
- 0.2-0.4: Consistent, focused descriptions (recommended)
|
| 98 |
+
- 0.5-0.7: Balanced creativity and consistency
|
| 99 |
+
- 0.8-1.0: More creative variations
|
| 100 |
+
- **Top-p**: Nucleus sampling parameter, controls diversity (default: 0.9)
|
| 101 |
|
| 102 |
## Technical Details
|
| 103 |
|
| 104 |
- **Model**: openai/gpt-oss-20b
|
| 105 |
- **Framework**: Gradio (latest stable version)
|
| 106 |
+
- **Retry Logic**: 3 attempts per category with 1-second delay between retries
|
| 107 |
+
- **Validation**: JSON parsing, structure validation, and minimum length checks
|
| 108 |
- **Processing**: Categories are deduplicated automatically
|
| 109 |
+
- **Rate Limiting**: 0.5-second delay between categories to avoid API throttling
|
| 110 |
- **Output Files**: Named as `output_{original_name}_{timestamp}.csv`
|
| 111 |
+
- **Zero GPU Support**: Free GPU acceleration available for Spaces
|
| 112 |
|
| 113 |
## Troubleshooting
|
| 114 |
|
|
|
|
| 129 |
- Verify the token hasn't expired
|
| 130 |
- Make sure you're using a valid token from https://huggingface.co/settings/tokens
|
| 131 |
|
| 132 |
+
### Inconsistent or incomplete output
|
| 133 |
+
- Lower the Temperature to 0.2-0.4 for more consistent results
|
| 134 |
+
- Check the Status column in output CSV to identify failed categories
|
| 135 |
+
- Failed categories can be extracted and reprocessed separately
|
| 136 |
+
- Zero GPU will provide more reliable processing with better resources
|
| 137 |
+
|
| 138 |
### Slow processing
|
| 139 |
+
- The model processes each unique category individually (includes retries)
|
| 140 |
- Large files with many unique categories will take longer
|
| 141 |
- Consider splitting very large files into smaller batches
|
| 142 |
+
- Zero GPU acceleration is automatically available for your Space
|
| 143 |
+
- Each category has a 0.5s delay to prevent rate limiting
|
| 144 |
|
| 145 |
## Local Development
|
| 146 |
|