YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Custom LLM Chatbot
A complete implementation of a transformer-based language model chatbot built from scratch using PyTorch. This project includes the model architecture, training pipeline, and inference server with a web interface.
Features
- Custom transformer-based language model implementation
- Byte-Pair Encoding (BPE) tokenizer
- Distributed training support
- FastAPI-based inference server
- Clean web interface for chat interactions
- Temperature and top-k/top-p sampling for response generation
Requirements
- Python 3.8+
- CUDA-capable GPU (recommended for training)
- See
requirements.txtfor Python package dependencies
Installation
- Clone the repository:
git clone https://github.com/yourusername/llm-chatbot.git
cd llm-chatbot
- Create a virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Training
Prepare your training data in a text file format.
Train the tokenizer and model:
# First, train the tokenizer on your data
python train_tokenizer.py --input_file path/to/your/data.txt --vocab_size 32000
# Then train the model
python train.py --train_file path/to/your/data.txt --output_dir ./output
For distributed training on multiple GPUs:
python -m torch.distributed.launch --nproc_per_node=NUM_GPUS train.py --train_file path/to/your/data.txt
Running the Server
- Start the FastAPI server:
python server.py
- Open
templates/index.htmlin your web browser or serve it using a static file server.
Model Architecture
- 12 transformer layers
- 768 hidden dimensions
- 12 attention heads
- 2048 sequence length
- Learned positional embeddings
Customization
You can modify the model architecture and training parameters by editing the LLMConfig class in model.py. Key parameters include:
vocab_size: Size of the tokenizer vocabularyhidden_size: Dimension of the model's hidden statesnum_hidden_layers: Number of transformer layersnum_attention_heads: Number of attention heads per layerintermediate_size: Dimension of the feed-forward networkmax_position_embeddings: Maximum sequence length
License
MIT License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support