argilla/ultrafeedback-binarized-preferences-cleaned
Viewer • Updated • 60.9k • 14k • 162
How to use AINovice2005/ElEmperador with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="AINovice2005/ElEmperador")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("AINovice2005/ElEmperador")
model = AutoModelForCausalLM.from_pretrained("AINovice2005/ElEmperador")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use AINovice2005/ElEmperador with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AINovice2005/ElEmperador"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AINovice2005/ElEmperador",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/AINovice2005/ElEmperador
How to use AINovice2005/ElEmperador with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "AINovice2005/ElEmperador" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AINovice2005/ElEmperador",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "AINovice2005/ElEmperador" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AINovice2005/ElEmperador",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use AINovice2005/ElEmperador with Docker Model Runner:
docker model run hf.co/AINovice2005/ElEmperador
ElEmperador is an ORPO-based finetune derived from the Mistral-7B-v0.1 base model.
BLEU:0.209
def generate_response(model_name, input_text, max_new_tokens=50):
# Load the tokenizer and model from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Tokenize the input text
input_ids = tokenizer(input_text, return_tensors='pt').input_ids
# Generate a response using the model
with torch.no_grad():
generated_ids = model.generate(input_ids, max_new_tokens=max_new_tokens)
# Decode the generated tokens into text
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
return generated_text
if __name__ == "__main__":
# Set the model name from Hugging Face Hub
model_name = "AINovice2005/ElEmperador"
input_text = "Hello, how are you?"
# Generate and print the model's response
output = generate_response(model_name, input_text)
print(f"Input: {input_text}")
print(f"Output: {output}")
Firstly,ORPO is a viable RLHF algorithm to improve the performance of your models along with SFT finetuning.Secondly, it also helps in aligning the model’s outputs more closely with human preferences, leading to more user-friendly and acceptable results.