Instructions to use BetterHF/vicuna-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BetterHF/vicuna-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BetterHF/vicuna-7b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("BetterHF/vicuna-7b") model = AutoModelForCausalLM.from_pretrained("BetterHF/vicuna-7b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use BetterHF/vicuna-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BetterHF/vicuna-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BetterHF/vicuna-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/BetterHF/vicuna-7b
- SGLang
How to use BetterHF/vicuna-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BetterHF/vicuna-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BetterHF/vicuna-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BetterHF/vicuna-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BetterHF/vicuna-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use BetterHF/vicuna-7b with Docker Model Runner:
docker model run hf.co/BetterHF/vicuna-7b
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
vicuna-7b
The repo contains the converted vicuna-7b model files.
The base model is from decapoda-research/llama-7b-hf and the delta model is from lmsys/vicuna-7b-delta-v0.
The conversion script is
python3 -m fastchat.model.apply_delta \
--base decapoda-research/llama-7b-hf \
--target /output/path/to/vicuna-7b \
--delta lmsys/vicuna-7b-delta-v0
tree -s vicuna-7b
vicuna-7b
βββ [ 9976642750] pytorch_model-00001-of-00002.bin
βββ [ 3500323731] pytorch_model-00002-of-00002.bin
βββ [ 499723] tokenizer.model
tree -s vicuna-7b-delta-v0
vicuna-7b-delta-v0/
βββ [ 9976642750] pytorch_model-00001-of-00002.bin
βββ [ 3500323731] pytorch_model-00002-of-00002.bin
βββ [ 26788] pytorch_model.bin.index.json
βββ [ 499723] tokenizer.model
Although the file size with same prefix is equal, but the content is different. And the md5sum result is
5e63e46f2547e75c209c79aebe21f5ca vicuna-7b/pytorch_model-00001-of-00002.bin
53a0d7b5e34c432401f1979b454272ee vicuna-7b/pytorch_model-00002-of-00002.bin
7ded4bec6556250247b1efbb3b7e77e8 vicuna-7b-delta-v0/pytorch_model-00001-of-00002.bin
ee48ac060a0f352ba8fa3dea0385e4c3 vicuna-7b-delta-v0/pytorch_model-00002-of-00002.bin
The converted model has been tested using vicuna cli and got correct response. So enjoy it~
Q: hi how are you
A: I'm just a computer program, so I don't have feelings or emotions like a human. I'm here to respond to your questions and provide information to the best of my ability. Is there anything else you'd like to know?
Q: really cool
A: Thank you! I'm glad you think so. Is there anything in particular you'd like to know or discuss? I'm here to help so feel free to ask me anything.
- Downloads last month
- 17