Because the method for using Hugging Face Inference with LangChain has changed frequently in the past, information online is often confusing, making it difficult to find the correct solution through search engines or AI…
The most likely cause in your case is not the prompt template and not the chain operator. It is usually this: HuggingFaceEndpoint is making a text-generation request, while mistralai/Mistral-7B-Instruct-v0.3 is often exposed by the current Hugging Face provider routing as a chat / conversational model instead. There is a public LangChain issue with this exact model and this exact error family, and the HuggingFaceEndpoint reference explicitly says the class is for models that support the text generation task. (GitHub)
Why your code looks correct but still fails
Your code structure is valid:
prompt = PromptTemplate(...)
llm = HuggingFaceEndpoint(...)
chain = prompt | llm
response = chain.invoke(...)
That pattern is fine. The failure usually happens one layer lower, when LangChain calls Hugging Face through HuggingFaceEndpoint. In the reported Mistral case, the stack trace shows HuggingFaceEndpoint calling client.text_generation(...), and the backend rejects it because that model is mapped to conversational for that provider instead of text-generation. (GitHub)
The main background
Hugging Face now separates Text Generation and Chat Completion more clearly than many older examples did. The Text Generation docs describe prompt-based generation, while the Chat Completion docs describe message-based conversational generation. On top of that, Hugging Face Inference Providers automatically route requests to a provider, and support depends on the model + provider + task combination, not just the model name alone. (Hugging Face)
That is why this can fail even though the code looks simple. The prompt is valid. The chain is valid. But the selected backend may say: “this model is available here as chat, not as plain text generation.” (GitHub)
Causes, ranked for your exact code
1. Most likely: model-task mismatch
This is the best match for your case.
HuggingFaceEndpoint is for text-generation models. Your chosen model is an instruct/chat-style model, and there is a documented case where this exact model fails with:
ValueError: Model mistralai/Mistral-7B-Instruct-v0.3 is not supported for task text-generation ...
Supported task: conversational
That means the wrapper is asking for one kind of inference, while the provider only offers another kind for that model. (GitHub)
2. Also likely: you are using the older import path
Your code imports:
from langchain_community.llms import HuggingFaceEndpoint
Current LangChain docs use the dedicated Hugging Face integration package:
from langchain_huggingface import HuggingFaceEndpoint
and the current LangChain docs and chat docs both point to langchain-huggingface as the active integration path. Using the older community path can leave you on examples or behavior that do not match the current Hugging Face routing model. (7x.mintlify.app)
3. Sometimes relevant: missing explicit task="text-generation"
There is a separate LangChain issue where HuggingFaceEndpoint raised a ValueError until task="text-generation" was added explicitly. That is a real issue, but it is a different one. If your actual error says the model only supports conversational, then adding task="text-generation" alone will not solve the real mismatch. It only fixes the “missing task” variant. (GitHub)
4. Worth checking: token and provider setup
Hugging Face’s current Inference Providers docs say you should use a fine-grained token with Make calls to Inference Providers permission. They also note that provider selection is automatic by default, unless you pin one yourself. Wrong token permissions or provider availability problems can produce other failures, although those often show up as authorization or provider-selection errors rather than your exact ValueError. (Hugging Face)
What your code is really doing
When you write this:
llm = HuggingFaceEndpoint(
repo_id="mistralai/Mistral-7B-Instruct-v0.3",
temperature=0.7,
timeout=300
)
you are saying, in effect:
“Treat this model like a plain text-generation endpoint.”
That is the key assumption. For many models that is fine. For this exact Mistral model on current provider-backed inference, that assumption often breaks. The public issue for this model shows the failure is coming from the text-generation path, not from PromptTemplate or LCEL chaining. (GitHub)
The best fixes
Fix 1. Move to the current package first
Install the current packages:
pip install -U langchain langchain-huggingface huggingface_hub
Then import from the current package:
from langchain_huggingface import HuggingFaceEndpoint
That aligns your code with the current documented integration path. LangChain’s current Hugging Face docs use langchain_huggingface, not langchain_community, for these examples. (7x.mintlify.app)
Fix 2. If you want plain prompt → text output, use a model that supports text-generation
HuggingFaceEndpoint is the right tool only when the model supports text generation on the chosen backend. The reference page says exactly that. Hugging Face’s text-generation docs also treat this as a separate task from chat completion. (LangChain)
A safer version of your code looks like this:
from langchain_core.prompts import PromptTemplate
from langchain_huggingface import HuggingFaceEndpoint
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "hf_your_token_here"
prompt = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?"
)
llm = HuggingFaceEndpoint(
repo_id="Qwen/Qwen2.5-7B-Instruct-1M",
task="text-generation",
provider="auto",
max_new_tokens=64,
temperature=0.7,
timeout=300,
)
chain = prompt | llm
response = chain.invoke({"product": "camera"})
print("AI Suggestion:", response)
This fix does two things:
- it uses the current package path,
- it makes the task explicit for a model path that is documented in current Hugging Face task docs. (LangChain Documentation)
Fix 3. If you want to keep mistralai/Mistral-7B-Instruct-v0.3, treat it as a chat model
This is the more natural choice for an instruct model. LangChain’s chat docs show ChatHuggingFace instantiated from a HuggingFaceEndpoint, and Hugging Face’s own docs define chat completion as the message-based interface for conversational models. (LangChain Documentation)
Example:
from langchain_core.prompts import ChatPromptTemplate
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "hf_your_token_here"
prompt = ChatPromptTemplate.from_messages([
("human", "What is a good name for a company that makes {product}?")
])
llm = HuggingFaceEndpoint(
repo_id="mistralai/Mistral-7B-Instruct-v0.3",
task="text-generation",
provider="auto",
max_new_tokens=64,
temperature=0.7,
timeout=300,
)
chat_model = ChatHuggingFace(llm=llm)
chain = prompt | chat_model
response = chain.invoke({"product": "camera"})
print("AI Suggestion:", response.content)
Important nuance: even here, HuggingFaceEndpoint still sits underneath. This approach is most useful when LangChain’s chat wrapper and the selected provider/model route agree. For provider-backed inference, a direct chat client can sometimes be even clearer. (LangChain Documentation)
Fix 4. Test the model outside LangChain first
This is the fastest way to separate:
- “LangChain wrapper problem”
from
- “model/provider/task problem.”
Hugging Face’s Inference Providers docs show InferenceClient using chat completions directly for conversational models. (Hugging Face)
Example:
from huggingface_hub import InferenceClient
client = InferenceClient()
response = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.3",
messages=[
{"role": "user", "content": "What is a good name for a company that makes camera products?"}
],
max_tokens=64,
)
print(response.choices[0].message.content)
If this works and your LangChain code does not, then the problem is almost certainly the wrapper choice rather than the model itself. Hugging Face documents chat.completions.create as the chat-completion path, and the source reference shows it is the intended method for message-based chat use. (Hugging Face)
Fix 5. If your real error is the “missing task” version, add task
This is the smaller fix.
llm = HuggingFaceEndpoint(
repo_id="some-text-generation-model",
task="text-generation",
temperature=0.7,
timeout=300,
)
That exact change resolved a documented issue where the docs omitted the needed task parameter. (GitHub)
A practical debug checklist
Use this order.
-
Upgrade to the current package path:
-
Make sure your token is a fine-grained token with Make calls to Inference Providers permission.
(Hugging Face)
-
If you stay with HuggingFaceEndpoint, add task="text-generation".
(GitHub)
-
If the error says Supported task: conversational, stop trying to force that model through a plain text-generation flow. Switch to a chat-model path or choose a model/provider combination documented for text generation.
(GitHub)
-
Test the same model once with InferenceClient.chat.completions.create(...). That tells you whether the model route itself works.
(Hugging Face)
My judgment for your exact snippet
Here is the probability ranking I would use.
- Most likely:
HuggingFaceEndpoint is calling the text-generation path, but your chosen Mistral model is being exposed by the provider as conversational instead. (GitHub)
- Next most likely: you are mixing current Hugging Face routing with the older
langchain_community integration path. (7x.mintlify.app)
- Possible: you also need an explicit
task="text-generation" because of your installed versions. (GitHub)
- Possible but less likely for this exact error: token/provider configuration issue. (Hugging Face)
So the cleanest summary is:
Your prompt and chain are probably fine. The main problem is that you are using a text-generation wrapper with a model that the current Hugging Face backend often exposes as chat/conversational instead. (GitHub)