SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding
Paper • 2308.10529 • Published • 1
How to use DAMO-NLP/SeqGPT-560M with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="DAMO-NLP/SeqGPT-560M") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("DAMO-NLP/SeqGPT-560M")
model = AutoModelForCausalLM.from_pretrained("DAMO-NLP/SeqGPT-560M")How to use DAMO-NLP/SeqGPT-560M with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DAMO-NLP/SeqGPT-560M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DAMO-NLP/SeqGPT-560M",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/DAMO-NLP/SeqGPT-560M
How to use DAMO-NLP/SeqGPT-560M with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "DAMO-NLP/SeqGPT-560M" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DAMO-NLP/SeqGPT-560M",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "DAMO-NLP/SeqGPT-560M" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DAMO-NLP/SeqGPT-560M",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use DAMO-NLP/SeqGPT-560M with Docker Model Runner:
docker model run hf.co/DAMO-NLP/SeqGPT-560M
This is SeqGPT-560M weight, a compact model targeting open-domain Natural Language Understanding (NLU). We refer you to our github repo for more details.
The model is fine-tuned based on BLOOMZ-560M.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel
model_name_or_path = 'DAMO-NLP/SeqGPT-560M'
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
tokenizer.padding_side = 'left'
tokenizer.truncation_side = 'left'
if torch.cuda.is_available():
model = model.half().cuda()
model.eval()
GEN_TOK = '[GEN]'
while True:
sent = input('输入/Input: ').strip()
task = input('分类/classify press 1, 抽取/extract press 2: ').strip()
labels = input('标签集/Label-Set (e.g, labelA,LabelB,LabelC): ').strip().replace(',', ',')
task = '分类' if task == '1' else '抽取'
# Changing the instruction can harm the performance
p = '输入: {}\n{}: {}\n输出: {}'.format(sent, task, labels, GEN_TOK)
input_ids = tokenizer(p, return_tensors="pt", padding=True, truncation=True, max_length=1024)
input_ids = input_ids.to(model.device)
outputs = model.generate(**input_ids, num_beams=4, do_sample=False, max_new_tokens=256)
input_ids = input_ids.get('input_ids', input_ids)
outputs = outputs[0][len(input_ids[0]):]
response = tokenizer.decode(outputs, skip_special_tokens=True)
print('BOT: ========== \n{}'.format(response))