JustinTong commited on
Commit
53fb878
·
verified ·
1 Parent(s): 6dffa3d

Update README.md to include SGLang instruction

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md CHANGED
@@ -215,6 +215,99 @@ print(response.json()["choices"][0]["message"]["content"])
215
  ```
216
  </details>
217
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
218
  #### Transformers
219
 
220
  <details>
 
215
  ```
216
  </details>
217
 
218
+ #### SGLang (recommended)
219
+
220
+ <details>
221
+ <summary>Expand</summary>
222
+
223
+ We recommend using this model with [SGLang](https://github.com/sgl-project/sglang)
224
+ to implement production-ready inference pipelines (OpenAI-compatible API server).
225
+
226
+ **_Installation_**
227
+
228
+ Install SGLang from source (track latest `main` locally):
229
+
230
+ ```
231
+ git clone https://github.com/sgl-project/sglang.git
232
+ cd sglang
233
+ uv pip install -e python
234
+ uv pip install transformers==5.0.0rc # required
235
+ ```
236
+
237
+ **_Launch server_**
238
+
239
+ We recommend that you use Devstral in a server/client setting.
240
+
241
+ 1. Spin up a server:
242
+
243
+ ```
244
+ python -m sglang.launch_server --model-path mistralai/Devstral-2-123B-Instruct-2512 --host 0.0.0.0 --port 30000 --tp 8 --tool-call-parser mistral
245
+ ```
246
+
247
+
248
+ 2. To ping the client you can use a simple Python snippet.
249
+
250
+ ```py
251
+ import requests
252
+ import json
253
+ from huggingface_hub import hf_hub_download
254
+
255
+
256
+ url = "http://<your-server-url>:30000/v1/chat/completions"
257
+ headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
258
+
259
+ model = "mistralai/Devstral-2-123B-Instruct-2512"
260
+
261
+ def load_system_prompt(repo_id: str, filename: str) -> str:
262
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
263
+ with open(file_path, "r") as file:
264
+ system_prompt = file.read()
265
+ return system_prompt
266
+
267
+ SYSTEM_PROMPT = load_system_prompt(model, "CHAT_SYSTEM_PROMPT.txt")
268
+
269
+ messages = [
270
+ {"role": "system", "content": SYSTEM_PROMPT},
271
+ {
272
+ "role": "user",
273
+ "content": [
274
+ {
275
+ "type": "text",
276
+ "text": "<your-command>",
277
+ },
278
+ ],
279
+ },
280
+ ]
281
+
282
+ data = {"model": model, "messages": messages, "temperature": 0.15}
283
+
284
+ # Devstral 2 supports tool calling. If you want to use tools, follow this:
285
+ # tools = [ # Define tools (OpenAI-compatible)
286
+ # {
287
+ # "type": "function",
288
+ # "function": {
289
+ # "name": "git_clone",
290
+ # "description": "Clone a git repository",
291
+ # "parameters": {
292
+ # "type": "object",
293
+ # "properties": {
294
+ # "url": {
295
+ # "type": "string",
296
+ # "description": "The url of the git repository",
297
+ # },
298
+ # },
299
+ # "required": ["url"],
300
+ # },
301
+ # },
302
+ # }
303
+ # ]
304
+ # data = {"model": model, "messages": messages, "temperature": 0.15, "tools": tools} # Pass tools to payload.
305
+
306
+ response = requests.post(url, headers=headers, data=json.dumps(data))
307
+ print(response.json()["choices"][0]["message"]["content"])
308
+ ```
309
+ </details>
310
+
311
  #### Transformers
312
 
313
  <details>