--- library_name: transformers license: mit base_model: salimalsazu/smart-category-detector-v1 tags: - generated_from_trainer metrics: - accuracy model-index: - name: smart-category-detector-v1 results: [] --- # smart-category-detector-v1 **Lightweight Transformer-based classifier for short text categorization (60 categories)** `smart-category-detector-v1` is a multi-class text classification model designed to predict one of **60 categories** from short text inputs such as: - product titles - marketplace listings - event announcements - restaurant menu items with prices The model is optimized for **fast inference and practical categorization tasks**. ------------------------------------------------------------------------ # Model Details Field Value ------------------ --------------------------------- Developer Salim Al Sazu Hugging Face `salimalsazu` Model Type DistilBERT Fine-tuned Base Model `distilbert-base-uncased` Task Multi-class Text Classification Categories 60 Training Samples \~600,000 Language English License MIT ------------------------------------------------------------------------ # Training Dataset The model was trained on a **curated dataset of approximately 600,000 samples** containing short text entries mapped to one of 60 categories. ### Dataset Structure Column Description ---------- ----------------------------- text Short input text category Target classification label Example: Samsung Galaxy S24 Ultra 512GB Mobile -\> smartphones\ Dhaka International Book Fair 2026 -\> book_fair\ Jamboo Burger Tk 220 -\> burgers\ Lenovo ThinkPad X1 Carbon Laptop -\> laptops\ Chicken Biryani Tk 180 -\> biryani ------------------------------------------------------------------------ # Dataset Distribution The dataset is divided into **three main groups**, each containing **20 categories**. ## Event Categories (20) --- \~200,000 samples sports\ music\ tech_conference\ education_seminar\ business_summit\ startup_pitch\ job_fair\ art_exhibition\ cultural_festival\ religious_event\ political_rally\ charity_event\ workshop\ webinar\ networking_event\ book_fair\ food_festival\ fashion_show\ award_ceremony\ hackathon ------------------------------------------------------------------------ ## Product Categories (20) --- \~200,000 samples electronics\ smartphones\ laptops\ fashion_clothing\ shoes\ beauty_cosmetics\ grocery_food\ furniture\ home_appliances\ kitchen_items\ sports_equipment\ books\ toys\ baby_products\ health_supplements\ automotive\ gaming\ jewelry\ office_supplies\ pet_products ------------------------------------------------------------------------ ## Restaurant / Menu Categories (20) --- \~200,000 samples burgers\ pizza\ sandwich_wraps\ fries_sides\ fried_snacks\ street_food\ biryani\ rice_dishes\ noodles_pasta\ curries\ bbq_grill\ seafood_dishes\ breakfast_items\ soups_salads\ cakes_pastries\ ice_cream\ traditional_sweets\ coffee_tea\ soft_drinks\ shakes_smoothies ------------------------------------------------------------------------ # Example Predictions Input Prediction --------------------------------------- ------------- Samsung Galaxy S24 Ultra 512GB Mobile smartphones Dhaka International Book Fair 2026 book_fair Jamboo Burger Tk 220 burgers HP Pavilion RTX 4060 Gaming Laptop laptops Beef Burger Combo Tk 350 burgers ------------------------------------------------------------------------ # Quick Start ``` python from transformers import pipeline clf = pipeline( "text-classification", model="salimalsazu/smart-category-detector-v1", top_k=5 ) print(clf("Samsung Galaxy S24 Ultra 512GB Mobile")) print(clf("Dhaka International Book Fair 2026")) print(clf("Jamboo Burger Tk 220")) ``` ------------------------------------------------------------------------ # Limitations - Works best with **short text** - Designed primarily for **English text** - Mixed-language inputs may reduce accuracy - Limited to **60 predefined categories** - Unknown categories may be mapped to the closest label ------------------------------------------------------------------------ # Future Improvements Potential improvements include: - adding real-world marketplace data - improving Bangla language support - increasing spelling robustness - adding an "unknown" label - publishing benchmark evaluation metrics ------------------------------------------------------------------------ # Citation Salim Al Sazu\ smart-category-detector-v1