# 模型

Smolagents 是一个实验性 API，其可能会随时发生更改。由于 API 或底层模型可能会变化，智能体返回的结果可能会有所不同。

要了解有关智能体和工具的更多信息，请务必阅读[入门指南](../index)。此页面包含底层类的 API 文档。

## 模型

您可以自由创建和使用自己的模型为智能体提供支持。

您可以使用任何 `model` 可调用对象作为智能体的模型，只要满足以下条件：
1. 它遵循[消息格式](./chat_templating)（`List[Dict[str, str]]`），将其作为输入 `messages`，并返回一个 `str`。
2. 它在生成的序列到达 `stop_sequences` 参数中指定的内容之前停止生成输出。

要定义您的 LLM，可以创建一个 `custom_model` 方法，该方法接受一个 [messages](./chat_templating) 列表，并返回一个包含 `.content` 属性的对象，其中包含生成的文本。此可调用对象还需要接受一个 `stop_sequences` 参数，用于指示何时停止生成。

```python
from huggingface_hub import login, InferenceClient

login("")

model_id = "meta-llama/Llama-3.3-70B-Instruct"

client = InferenceClient(model=model_id)

def custom_model(messages, stop_sequences=["Task"]):
    response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
    answer = response.choices[0].message
    return answer
```

此外，`custom_model` 还可以接受一个 `grammar` 参数。如果在智能体初始化时指定了 `grammar`，则此参数将在调用模型时传递，以便进行[约束生成](https://huggingface.co/docs/text-generation-inference/conceptual/guidance)，从而强制生成格式正确的智能体输出。

### TransformersModel[[smolagents.TransformersModel]]

为了方便起见，我们添加了一个 `TransformersModel`，该模型通过为初始化时指定的 `model_id` 构建一个本地 `transformers` pipeline 来实现上述功能。

```python
from smolagents import TransformersModel

model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": [{"type": "text", "text": "Ok!"}]}], stop_sequences=["great"]))
```
```text
>>> What a
```

> [!TIP]
> 您必须在机器上安装 `transformers` 和 `torch`。如果尚未安装，请运行 `pip install 'smolagents[transformers]'`。

#### smolagents.TransformersModel[[smolagents.TransformersModel]]

[Source](https://github.com/huggingface/smolagents/blob/v1.23.0/src/smolagents/models.py#L827)

A class that uses Hugging Face's Transformers library for language model interaction.

This model allows you to load and use Hugging Face's models locally using the Transformers library. It supports features like stop sequences and grammar customization.

> [!TIP]
> You must have `transformers` and `torch` installed on your machine. Please run `pip install 'smolagents[transformers]'` if it's not the case.

Example:
```python
>>> engine = TransformersModel(
...     model_id="Qwen/Qwen3-Next-80B-A3B-Thinking",
...     device="cuda",
...     max_new_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."
```

**Parameters:**

model_id (`str`) : The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. For example, `"Qwen/Qwen3-Next-80B-A3B-Thinking"`.

device_map (`str`, *optional*) : The device_map to initialize your model with.

torch_dtype (`str`, *optional*) : The torch_dtype to initialize your model with.

trust_remote_code (bool, default `False`) : Some models on the Hub require running remote code: for this model, you would have to set this flag to True.

model_kwargs (`dict[str, Any]`, *optional*) : Additional keyword arguments to pass to `AutoModel.from_pretrained` (like revision, model_args, config, etc.).

max_new_tokens (`int`, default `4096`) : Maximum number of new tokens to generate, ignoring the number of tokens in the prompt.

max_tokens (`int`, *optional*) : Alias for `max_new_tokens`. If provided, this value takes precedence.

- ****kwargs** : Additional keyword arguments to forward to the underlying Transformers model generate call, such as `device`.

### InferenceClientModel[[smolagents.InferenceClientModel]]

`InferenceClientModel` 封装了 huggingface_hub 的 [InferenceClient](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference)，用于执行 LLM。它支持 HF 的 [Inference API](https://huggingface.co/docs/api-inference/index) 以及 Hub 上所有可用的[Inference Providers](https://huggingface.co/blog/inference-providers)。

```python
from smolagents import InferenceClientModel

messages = [
  {"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]

model = InferenceClientModel()
print(model(messages))
```
```text
>>> Of course! If you change your mind, feel free to reach out. Take care!
```
#### smolagents.InferenceClientModel[[smolagents.InferenceClientModel]]

[Source](https://github.com/huggingface/smolagents/blob/v1.23.0/src/smolagents/models.py#L1418)

A class to interact with Hugging Face's Inference Providers for language model interaction.

This model allows you to communicate with Hugging Face's models using Inference Providers. It can be used in both serverless mode, with a dedicated endpoint, or even with a local URL, supporting features like stop sequences and grammar customization.

Providers include Cerebras, Cohere, Fal, Fireworks, HF-Inference, Hyperbolic, Nebius, Novita, Replicate, SambaNova, Together, and more.

Example:
```python
>>> engine = InferenceClientModel(
...     model_id="Qwen/Qwen3-Next-80B-A3B-Thinking",
...     provider="hyperbolic",
...     token="your_hf_token_here",
...     max_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."
```

create_clientsmolagents.InferenceClientModel.create_clienthttps://github.com/huggingface/smolagents/blob/v1.23.0/src/smolagents/models.py#L1509[]
Create the Hugging Face client.

**Parameters:**

model_id (`str`, *optional*, default `"Qwen/Qwen3-Next-80B-A3B-Thinking"`) : The Hugging Face model ID to be used for inference. This can be a model identifier from the Hugging Face model hub or a URL to a deployed Inference Endpoint. Currently, it defaults to `"Qwen/Qwen3-Next-80B-A3B-Thinking"`, but this may change in the future.

provider (`str`, *optional*) : Name of the provider to use for inference. A list of supported providers can be found in the [Inference Providers documentation](https://huggingface.co/docs/inference-providers/index#partners). Defaults to "auto" i.e. the first of the providers available for the model, sorted by the user's order [here](https://hf.co/settings/inference-providers). If `base_url` is passed, then `provider` is not used.

token (`str`, *optional*) : Token used by the Hugging Face API for authentication. This token need to be authorized 'Make calls to the serverless Inference Providers'. If the model is gated (like Llama-3 models), the token also needs 'Read access to contents of all public gated repos you can access'. If not provided, the class will try to use environment variable 'HF_TOKEN', else use the token stored in the Hugging Face CLI configuration.

timeout (`int`, *optional*, defaults to 120) : Timeout for the API request, in seconds.

client_kwargs (`dict[str, Any]`, *optional*) : Additional keyword arguments to pass to the Hugging Face InferenceClient.

custom_role_conversions (`dict[str, str]`, *optional*) : Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like "system".

api_key (`str`, *optional*) : Token to use for authentication. This is a duplicated argument from `token` to make [InferenceClientModel](/docs/smolagents/v1.23.0/zh/reference/models#smolagents.InferenceClientModel) follow the same pattern as `openai.OpenAI` client. Cannot be used if `token` is set. Defaults to None.

bill_to (`str`, *optional*) : The billing account to use for the requests. By default the requests are billed on the user's account. Requests can only be billed to an organization the user is a member of, and which has subscribed to Enterprise Hub.

base_url (`str`, `optional`) : Base URL to run inference. This is a duplicated argument from `model` to make [InferenceClientModel](/docs/smolagents/v1.23.0/zh/reference/models#smolagents.InferenceClientModel) follow the same pattern as `openai.OpenAI` client. Cannot be used if `model` is set. Defaults to None.

- ****kwargs** : Additional keyword arguments to forward to the underlying Hugging Face InferenceClient completion call.

### LiteLLMModel[[smolagents.LiteLLMModel]]

`LiteLLMModel` 利用 [LiteLLM](https://www.litellm.ai/) 支持来自不同提供商的 100+ 个 LLM。您可以在模型初始化时传递 `kwargs`，这些参数将在每次使用模型时被使用，例如下面的示例中传递了 `temperature`。

```python
from smolagents import LiteLLMModel

messages = [
  {"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]

model = LiteLLMModel(model_id="anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
print(model(messages))
```

#### smolagents.LiteLLMModel[[smolagents.LiteLLMModel]]

[Source](https://github.com/huggingface/smolagents/blob/v1.23.0/src/smolagents/models.py#L1167)

Model to use [LiteLLM Python SDK](https://docs.litellm.ai/docs/#litellm-python-sdk) to access hundreds of LLMs.

create_clientsmolagents.LiteLLMModel.create_clienthttps://github.com/huggingface/smolagents/blob/v1.23.0/src/smolagents/models.py#L1217[]
Create the LiteLLM client.

**Parameters:**

model_id (`str`) : The model identifier to use on the server (e.g. "gpt-3.5-turbo").

api_base (`str`, *optional*) : The base URL of the provider API to call the model.

api_key (`str`, *optional*) : The API key to use for authentication.

custom_role_conversions (`dict[str, str]`, *optional*) : Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like "system".

flatten_messages_as_text (`bool`, *optional*) : Whether to flatten messages as text. Defaults to `True` for models that start with "ollama", "groq", "cerebras".

- ****kwargs** : Additional keyword arguments to forward to the underlying LiteLLM completion call.

### OpenAIModel[[smolagents.OpenAIModel]]

此类允许您调用任何 OpenAIServer 兼容模型。
以下是设置方法（您可以自定义 `api_base` URL 指向其他服务器）：
```py
import os
from smolagents import OpenAIModel

model = OpenAIModel(
    model_id="gpt-4o",
    api_base="https://api.openai.com/v1",
    api_key=os.environ["OPENAI_API_KEY"],
)
```

#### smolagents.OpenAIModel[[smolagents.OpenAIModel]]

[Source](https://github.com/huggingface/smolagents/blob/v1.23.0/src/smolagents/models.py#L1608)

This model connects to an OpenAI-compatible API server.

**Parameters:**

model_id (`str`) : The model identifier to use on the server (e.g. "gpt-5").

api_base (`str`, *optional*) : The base URL of the OpenAI-compatible API server.

api_key (`str`, *optional*) : The API key to use for authentication.

organization (`str`, *optional*) : The organization to use for the API request.

project (`str`, *optional*) : The project to use for the API request.

client_kwargs (`dict[str, Any]`, *optional*) : Additional keyword arguments to pass to the OpenAI client (like organization, project, max_retries etc.).

custom_role_conversions (`dict[str, str]`, *optional*) : Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like "system".

flatten_messages_as_text (`bool`, default `False`) : Whether to flatten messages as text.

- ****kwargs** : Additional keyword arguments to forward to the underlying OpenAI API completion call, for instance `temperature`.

### AzureOpenAIModel[[smolagents.AzureOpenAIModel]]

`AzureOpenAIModel` 允许您连接到任何 Azure OpenAI 部署。

下面是设置示例，请注意，如果已经设置了相应的环境变量，您可以省略 `azure_endpoint`、`api_key` 和 `api_version` 参数——环境变量包括 `AZURE_OPENAI_ENDPOINT`、`AZURE_OPENAI_API_KEY` 和 `OPENAI_API_VERSION`。

请注意，`OPENAI_API_VERSION` 没有 `AZURE_` 前缀，这是由于底层 [openai](https://github.com/openai/openai-python) 包的设计所致。

```py
import os

from smolagents import AzureOpenAIModel

model = AzureOpenAIModel(
    model_id = os.environ.get("AZURE_OPENAI_MODEL"),
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    api_version=os.environ.get("OPENAI_API_VERSION")    
)
```

#### smolagents.AzureOpenAIModel[[smolagents.AzureOpenAIModel]]

[Source](https://github.com/huggingface/smolagents/blob/v1.23.0/src/smolagents/models.py#L1761)

This model connects to an Azure OpenAI deployment.

**Parameters:**

model_id (`str`) : The model deployment name to use when connecting (e.g. "gpt-4o-mini").

azure_endpoint (`str`, *optional*) : The Azure endpoint, including the resource, e.g. `https://example-resource.azure.openai.com/`. If not provided, it will be inferred from the `AZURE_OPENAI_ENDPOINT` environment variable.

api_key (`str`, *optional*) : The API key to use for authentication. If not provided, it will be inferred from the `AZURE_OPENAI_API_KEY` environment variable.

api_version (`str`, *optional*) : The API version to use. If not provided, it will be inferred from the `OPENAI_API_VERSION` environment variable.

client_kwargs (`dict[str, Any]`, *optional*) : Additional keyword arguments to pass to the AzureOpenAI client (like organization, project, max_retries etc.).

custom_role_conversions (`dict[str, str]`, *optional*) : Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like "system".

- ****kwargs** : Additional keyword arguments to forward to the underlying Azure OpenAI API completion call.

### MLXModel[[smolagents.MLXModel]]

```python
from smolagents import MLXModel

model = MLXModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
```
```text
>>> What a
```

> [!TIP]
> 您必须在机器上安装 `mlx-lm`。如果尚未安装，请运行 `pip install 'smolagents[mlx-lm]'`。

#### smolagents.MLXModel[[smolagents.MLXModel]]

[Source](https://github.com/huggingface/smolagents/blob/v1.23.0/src/smolagents/models.py#L718)

A class to interact with models loaded using MLX on Apple silicon.

> [!TIP]
> You must have `mlx-lm` installed on your machine. Please run `pip install 'smolagents[mlx-lm]'` if it's not the case.

Example:
```python
>>> engine = MLXModel(
...     model_id="mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
...     max_tokens=10000,
... )
>>> messages = [
...     {
...         "role": "user",
...         "content": "Explain quantum mechanics in simple terms."
...     }
... ]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."
```

**Parameters:**

model_id (str) : The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub.

tool_name_key (str) : The key, which can usually be found in the model's chat template, for retrieving a tool name.

tool_arguments_key (str) : The key, which can usually be found in the model's chat template, for retrieving tool arguments.

trust_remote_code (bool, default `False`) : Some models on the Hub require running remote code: for this model, you would have to set this flag to True.

load_kwargs (dict[str, Any], *optional*) : Additional keyword arguments to pass to the `mlx.lm.load` method when loading the model and tokenizer.

apply_chat_template_kwargs (dict, *optional*) : Additional keyword arguments to pass to the `apply_chat_template` method of the tokenizer.

- ****kwargs** : Additional keyword arguments to forward to the underlying MLX model stream_generate call, for instance `max_tokens`.