piyushdev commited on
Commit
7ba47b9
Β·
verified Β·
1 Parent(s): 20be5b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -11
README.md CHANGED
@@ -17,9 +17,12 @@ A Hugging Face Gradio application that generates CLIP-ready visual descriptions
17
  - πŸ“€ **Upload Multiple CSV Files**: Process one or more CSV files at once
18
  - πŸ”„ **Batch Processing**: Automatically processes all unique categories from your files
19
  - πŸ€– **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
20
- - πŸ“Š **Progress Tracking**: Real-time progress updates during processing
21
- - πŸ’Ύ **Automatic Saving**: Output files are automatically generated with timestamps
 
 
22
  - πŸ“₯ **Easy Download**: Download all processed files directly from the interface
 
23
 
24
  ## How to Use
25
 
@@ -37,7 +40,12 @@ A Hugging Face Gradio application that generates CLIP-ready visual descriptions
37
  - **Name**: `HF_TOKEN`
38
  - **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
39
  - Click "Save"
40
- 6. Your app will be deployed and restart automatically!
 
 
 
 
 
41
 
42
  ### 2. Prepare Your CSV Files
43
 
@@ -71,28 +79,36 @@ Each output CSV file contains:
71
  | Column | Description |
72
  |--------|-------------|
73
  | `Category` | The original category keyword |
74
- | `Description` | The generated CLIP-ready visual description |
75
- | `Raw_Response` | The complete model response (JSON format) |
 
76
 
77
  ## Example Output
78
 
79
  ```csv
80
- Category,Description,Raw_Response
81
- Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}"
82
  ```
83
 
84
  ## Model Settings
85
 
86
- - **Max Tokens**: Controls the maximum length of generated descriptions
87
- - **Temperature**: Higher values (0.8-1.0) make output more creative, lower values (0.3-0.5) make it more focused
88
- - **Top-p**: Nucleus sampling parameter, controls diversity
 
 
 
89
 
90
  ## Technical Details
91
 
92
  - **Model**: openai/gpt-oss-20b
93
  - **Framework**: Gradio (latest stable version)
 
 
94
  - **Processing**: Categories are deduplicated automatically
 
95
  - **Output Files**: Named as `output_{original_name}_{timestamp}.csv`
 
96
 
97
  ## Troubleshooting
98
 
@@ -113,10 +129,18 @@ Car Rental For Self Driven,"a car available for self-drive rental, parked at a p
113
  - Verify the token hasn't expired
114
  - Make sure you're using a valid token from https://huggingface.co/settings/tokens
115
 
 
 
 
 
 
 
116
  ### Slow processing
117
- - The model processes each unique category individually
118
  - Large files with many unique categories will take longer
119
  - Consider splitting very large files into smaller batches
 
 
120
 
121
  ## Local Development
122
 
 
17
  - πŸ“€ **Upload Multiple CSV Files**: Process one or more CSV files at once
18
  - πŸ”„ **Batch Processing**: Automatically processes all unique categories from your files
19
  - πŸ€– **AI-Powered**: Uses OpenAI's GPT-OSS-20B model for high-quality descriptions
20
+ - πŸ” **Automatic Retry Logic**: 3 attempts per category with intelligent error recovery
21
+ - βœ… **Validation**: JSON validation and quality checks for every description
22
+ - πŸ“Š **Progress Tracking**: Real-time progress updates with success/failure reporting
23
+ - πŸ’Ύ **Automatic Saving**: Output files with Status column showing results
24
  - πŸ“₯ **Easy Download**: Download all processed files directly from the interface
25
+ - ⚑ **Zero GPU Support**: Use Zero GPU for faster, free GPU acceleration
26
 
27
  ## How to Use
28
 
 
40
  - **Name**: `HF_TOKEN`
41
  - **Value**: Your Hugging Face token (get from https://huggingface.co/settings/tokens)
42
  - Click "Save"
43
+ 6. **Optional: Enable Zero GPU for Faster Processing**:
44
+ - Zero GPU provides free GPU acceleration
45
+ - No Pro subscription required
46
+ - Space will automatically use GPU when available
47
+ - Significantly speeds up processing for large batches
48
+ 7. Your app will be deployed and restart automatically!
49
 
50
  ### 2. Prepare Your CSV Files
51
 
 
79
  | Column | Description |
80
  |--------|-------------|
81
  | `Category` | The original category keyword |
82
+ | `Description` | The generated CLIP-ready visual description (validated) |
83
+ | `Raw_Response` | The complete model response (for debugging) |
84
+ | `Status` | "Success" or "Failed" with error details |
85
 
86
  ## Example Output
87
 
88
  ```csv
89
+ Category,Description,Raw_Response,Status
90
+ Car Rental For Self Driven,"a car available for self-drive rental, parked at a pickup spot without a chauffeur; looks travel-ready, clean, well-maintained, keys handed over to customer","{""Category"": ""Car Rental For Self Driven"", ""Description"": ""...""}",Success
91
  ```
92
 
93
  ## Model Settings
94
 
95
+ - **Max Tokens**: Controls the maximum length of generated descriptions (default: 256)
96
+ - **Temperature**: Controls output consistency (default: 0.3)
97
+ - 0.2-0.4: Consistent, focused descriptions (recommended)
98
+ - 0.5-0.7: Balanced creativity and consistency
99
+ - 0.8-1.0: More creative variations
100
+ - **Top-p**: Nucleus sampling parameter, controls diversity (default: 0.9)
101
 
102
  ## Technical Details
103
 
104
  - **Model**: openai/gpt-oss-20b
105
  - **Framework**: Gradio (latest stable version)
106
+ - **Retry Logic**: 3 attempts per category with 1-second delay between retries
107
+ - **Validation**: JSON parsing, structure validation, and minimum length checks
108
  - **Processing**: Categories are deduplicated automatically
109
+ - **Rate Limiting**: 0.5-second delay between categories to avoid API throttling
110
  - **Output Files**: Named as `output_{original_name}_{timestamp}.csv`
111
+ - **Zero GPU Support**: Free GPU acceleration available for Spaces
112
 
113
  ## Troubleshooting
114
 
 
129
  - Verify the token hasn't expired
130
  - Make sure you're using a valid token from https://huggingface.co/settings/tokens
131
 
132
+ ### Inconsistent or incomplete output
133
+ - Lower the Temperature to 0.2-0.4 for more consistent results
134
+ - Check the Status column in output CSV to identify failed categories
135
+ - Failed categories can be extracted and reprocessed separately
136
+ - Zero GPU will provide more reliable processing with better resources
137
+
138
  ### Slow processing
139
+ - The model processes each unique category individually (includes retries)
140
  - Large files with many unique categories will take longer
141
  - Consider splitting very large files into smaller batches
142
+ - Zero GPU acceleration is automatically available for your Space
143
+ - Each category has a 0.5s delay to prevent rate limiting
144
 
145
  ## Local Development
146