直译
API
端点
规范
模型名称
模型名称遵循 model:tag
格式,其中 model
可包含可选命名空间,如 example/model
。示例包括 orca-mini:3b-q4_1
和 llama3:70b
。标签(tag)为可选,若未提供,默认使用 latest
。标签用于标识特定版本。
时长
所有时长以纳秒返回。
流式响应
某些端点以“ JSON ”对象流形式返回响应。可通过设置 {"stream": false}
禁用流式。
生成补全
端点:POST /api/generate
使用指定模型为给定的提示生成响应。此为流式端点,将返回一系列响应。最终响应对象包含请求的统计数据及附加信息。
参数
model
:(必需)模型名称
prompt
:用于生成响应的提示文本
suffix
:模型响应后的附加文本
images
:(可选)一组以 base64 编码的图像(适用于多模态模型,如 llava
)
高级参数(可选)
format
:响应格式,可为“ JSON ”或“ JSON 模式”。
options
:模型参数,如 temperature
,详见 Modelfile 文档。
system
:系统消息(覆盖 Modelfile
中定义的内容)。
template
:使用的提示模板(覆盖 Modelfile
中定义的内容)。
stream
:若设为 false
,响应以单一对象返回,而非对象流。
raw
:若为 true
,不对提示应用格式化。适用于请求中指定完整模板化提示的情况。
keep_alive
:控制模型在请求后保留在内存中的时间(默认:5分钟)。
context
(已弃用):前次 /generate
请求返回的上下文参数,用于保持短暂对话记忆。
结构化输出
通过在 format
参数中提供“ JSON 模式”支持结构化输出,模型将生成符合该模式的响应。参见下文结构化输出示例。
JSON 模式
通过设置 format
参数为“ JSON ”启用“ JSON 模式”,响应将结构化为有效的“ JSON ”对象。参见下文JSON 模式示例。
[!重要]
必须在 prompt
中指示模型使用“ JSON ”,否则模型可能生成大量空白。
示例
流式生成请求
请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?"
}'
```
**响应**
返回一组“ JSON ”对象流:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
}
```
流中的最终响应包含生成统计数据:
- `total_duration`:生成响应耗时
- `load_duration`:加载模型耗时(纳秒)
- `prompt_eval_count`:提示中的令牌数
- `prompt_eval_duration`:评估提示耗时(纳秒)
- `eval_count`:响应中的令牌数
- `eval_duration`:生成响应耗时(纳秒)
- `context`:本次响应使用的对话编码,可在下次请求中发送以保持对话记忆
- `response`:若为流式则为空,若非流式则包含完整响应
计算响应生成速度(令牌/秒)的方法:`eval_count` / `eval_duration` * `10^9`。
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "",
"done": true,
"context": [1, 2, 3],
"total_duration": 10706818083,
"load_duration": 6338219291,
"prompt_eval_count": 26,
"prompt_eval_duration": 130079000,
"eval_count": 259,
"eval_duration": 4232710000
}
非流式请求
请求
禁用流式后,响应将一次性返回。
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'
```
**响应**
若 ` stream ` 设为 ` false `,返回单一“ JSON ”对象:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
"context": [1, 2, 3],
"total_duration": 5043500667,
"load_duration": 5025959,
"prompt_eval_count": 26,
"prompt_eval_duration": 325953000,
"eval_count": 290,
"eval_duration": 4709213000
}
带后缀的请求
请求
curl http://localhost:11434/api/generate -d '{
"model": "codellama:code",
"prompt": "def compute_gcd(a, b):",
"suffix": " return result",
"options": {
"temperature": 0
},
"stream": false
}'
```
**响应**
```json5
{
"model": "codellama:code",
"created_at": "2024-07-22T20:47:51.147561Z",
"response": "\n if a == 0:\n return b\n else:\n return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n result = (a * b) / compute_gcd(a, b)\n",
"done": true,
"done_reason": "stop",
"context": [...],
"total_duration": 1162761250,
"load_duration": 6683708,
"prompt_eval_count": 17,
"prompt_eval_duration": 201222000,
"eval_count": 63,
"eval_duration": 953997000
}
结构化输出请求
请求
curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{
"model": "llama3.1:8b",
"prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",
"stream": false,
"format": {
"type": "object",
"properties": {
"age": {
"type": "integer"
},
"available": {
"type": "boolean"
}
},
"required": [
"age",
"available"
]
}
}'
```
**响应**
```json
{
"model": "llama3.1:8b",
"created_at": "2024-12-06T00:48:09.983619Z",
"response": "{\n \"age\": 22,\n \"available\": true\n}",
"done": true,
"done_reason": "stop",
"context": [1, 2, 3],
"total_duration": 1075509083,
"load_duration": 567678166,
"prompt_eval_count": 28,
"prompt_eval_duration": 236000000,
"eval_count": 16,
"eval_duration": 269000000
}
JSON 模式请求
[!重要]
当 format
设为“ JSON ”时,输出始终为格式正确的“ JSON ”对象。必须在 prompt
中指示模型使用“ JSON ”。
请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
"context": [1, 2, 3],
"total_duration": 4648158584,
"load_duration": 4071084,
"prompt_eval_count": 36,
"prompt_eval_duration": 439038000,
"eval_count": 180,
"eval_duration": 4196918000
}
```
` response ` 的值是一个包含类似以下内容的“ JSON ”字符串:
```json
{
"morning": {
"color": "blue"
},
"noon": {
"color": "blue-gray"
},
"afternoon": {
"color": "warm gray"
},
"evening": {
"color": "orange"
}
}
带图像的请求
为多模态模型(如 llava
或 bakllava
)提交图像,需提供一组 base64 编码的 images
:
请求
curl http://localhost:11434/api/generate -d '{
"model": "llava",
"prompt": "What is in this picture?",
"stream": false,
"images": ["MY_IMAGE"]
}'
```
**响应**
```json
{
"model": "llava",
"created_at": "2023-11-03T15:36:02.583064Z",
"response": "A happy cartoon character, which is cute and cheerful.",
"done": true,
"context": [1, 2, 3],
"total_duration": 2938432250,
"load_duration": 2559292,
"prompt_eval_count": 1,
"prompt_eval_duration": 2195557000,
"eval_count": 44,
"eval_duration": 736432000
}
原始模式请求
若需绕过模板系统直接提供完整提示,可使用 raw
参数禁用模板化。注意,原始模式不返回上下文。
请求
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "[INST] why is the sky blue? [/INST]",
"raw": true,
"stream": false
}'
可重现输出请求
为确保输出可重现,可设置 seed
为特定数字:
请求
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "Why is the sky blue?",
"options": {
"seed": 123
}
}'
```
**响应**
```json
{
"model": "mistral",
"created_at": "2023-11-03T15:36:02.583064Z",
"response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",
"done": true,
"total_duration": 8493852375,
"load_duration": 6589624375,
"prompt_eval_count": 14,
"prompt_eval_duration": 119039000,
"eval_count": 110,
"eval_duration": 1779061000
}
带选项的生成请求
若需在运行时设置自定义模型选项而非在 Modelfile
中定义,可使用 options
参数。此示例设置所有可用选项,但可单独设置部分选项并忽略不需要覆盖的选项。
请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
"num_keep": 5,
"seed": 42,
"num_predict": 100,
"top_k": 20,
"top_p": 0.9,
"min_p": 0.0,
"typical_p": 0.7,
"repeat_last_n": 33,
"temperature": 0.8,
"repeat_penalty": 1.2,
"presence_penalty": 1.5,
"frequency_penalty": 1.0,
"mirostat": 1,
"mirostat_tau": 0.8,
"mirostat_eta": 0.6,
"penalize_newline": true,
"stop": ["\n", "user:"],
"numa": false,
"num_ctx": 1024,
"num_batch": 2,
"num_gpu": 1,
"main_gpu": 0,
"low_vram": false,
"vocab_only": false,
"use_mmap": true,
"use_mlock": false,
"num_thread": 8
}
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
"context": [1, 2, 3],
"total_duration": 4935886791,
"load_duration": 534986708,
"prompt_eval_count": 26,
"prompt_eval_duration": 107345000,
"eval_count": 237,
"eval_duration": 4289432000
}
加载模型
若提供空提示,模型将被加载到内存中。
请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2"
}'
```
**响应**
返回单一“ JSON ”对象:
```json
{
"model": "llama3.2",
"created_at": "2023-12-18T19:52:07.071755Z",
"response": "",
"done": true
}
卸载模型
若提供空提示且 keep_alive
设为 0
,模型将从内存中卸载。
请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"keep_alive": 0
}'
```
**响应**
返回单一“ JSON ”对象:
```json
{
"model": "llama3.2",
"created_at": "2024-09-12T03:54:03.516566Z",
"response": "",
"done": true,
"done_reason": "unload"
}
生成对话补全
端点:POST /api/chat
使用指定模型生成对话中的下一条消息。此为流式端点,将返回一系列响应。可通过设置“ stream
: false ”禁用流式。最终响应对象包含统计数据及请求的附加信息。
参数
model
:(必需)模型名称
messages
:对话消息列表,用于保留对话记忆
tools
:模型可用的“ JSON ”工具列表(若支持)
消息对象字段
role
:消息角色,可为 system
、user
、assistant
或 tool
content
:消息内容
images
:(可选)包含在消息中的图像列表(适用于多模态模型,如 llava
)
tool_calls
:(可选)模型希望使用的“ JSON ”工具列表
高级参数(可选)
format
:响应格式,可为“ JSON ”或“ JSON 模式”。
options
:模型参数,如 temperature
,详见 Modelfile。
stream
:若为 false
,响应以单一对象返回。
keep_alive
:控制模型在请求后保留在内存中的时间(默认:5分钟)。
结构化输出
通过在 format
参数中提供“ JSON 模式”支持结构化输出,模型将生成符合该模式的响应。参见下文对话请求(结构化输出)示例。
示例
流式对话请求
请求
发送带流式响应的对话消息。
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'
```
**响应**
返回一组“ JSON ”对象流:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
"content": "The",
"images": null
},
"done": false
}
```
最终响应:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"message": {
"role": "assistant",
"content": ""
},
"done": true,
"total_duration": 4883583458,
"load_duration": 1334875,
"prompt_eval_count": 26,
"prompt_eval_duration": 342546000,
"eval_count": 282,
"eval_duration": 4535599000
}
非流式对话请求
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
],
"stream": false
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
"content": "Hello! How are you today?"
},
"done": true,
"total_duration": 5191566416,
"load_duration": 2154458,
"prompt_eval_count": 26,
"prompt_eval_duration": 383809000,
"eval_count": 298,
"eval_duration": 4799921000
}
结构化输出对话请求
请求
curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Ollama is 22 years old and busy saving the world. Return a JSON object with the age and availability."}],
"stream": false,
"format": {
"type": "object",
"properties": {
"age": {
"type": "integer"
},
"available": {
"type": "boolean"
}
},
"required": [
"age",
"available"
]
},
"options": {
"temperature": 0
}
}'
```
**响应**
```json
{
"model": "llama3.1",
"created_at": "2024-12-06T00:46:58.265747Z",
"message": { "role": "assistant", "content": "{\"age\": 22, \"available\": false}" },
"done_reason": "stop",
"done": true,
"total_duration": 2254970291,
"load_duration": 574751416,
"prompt_eval_count": 34,
"prompt_eval_duration": 1502000000,
"eval_count": 12,
"eval_duration": 175000000
}
带历史记录的对话请求
发送包含对话历史的对话消息。可用相同方法通过多轮或链式思考提示开始对话。
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
{
"role": "assistant",
"content": "due to rayleigh scattering."
},
{
"role": "user",
"content": "how is that different than mie scattering?"
}
]
}'
```
**响应**
返回一组“ JSON ”对象流:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
"content": "The"
},
"done": false
}
```
最终响应:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 8113331500,
"load_duration": 6396458,
"prompt_eval_count": 61,
"prompt_eval_duration": 398801000,
"eval_count": 468,
"eval_duration": 7701267000
}
带图像的对话请求
发送包含图像的对话消息。图像应以数组形式提供,单个图像以 base64 编码。
请求
curl http://localhost:11434/api/chat -d '{
"model": "llava",
"messages": [
{
"role": "user",
"content": "what is in this image?",
"images": ["MY_IMAGE"]
}
]
}'
```
**响应**
```json
{
"model": "llava",
"created_at": "2023-12-13T22:42:50.203334Z",
"message": {
"role": "assistant",
"content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",
"images": null
},
"done": true,
"total_duration": 1668506709,
"load_duration": 1986209,
"prompt_eval_count": 26,
"prompt_eval_duration": 359682000,
"eval_count": 83,
"eval_duration": 1303285000
}
可重现输出对话请求
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"options": {
"seed": 101,
"temperature": 0
}
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
"content": "Hello! How are you today?"
},
"done": true,
"total_duration": 5191566416,
"load_duration": 2154458,
"prompt_eval_count": 26,
"prompt_eval_duration": 383809000,
"eval_count": 298,
"eval_duration": 4799921000
}
带工具的对话请求
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "What is the weather today in Paris?"
}
],
"stream": false,
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the weather for, e.g. San Francisco, CA"
},
"format": {
"type": "string",
"description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location", "format"]
}
}
}
]
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2024-07-22T20:33:28.123648Z",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "get_current_weather",
"arguments": {
"format": "celsius",
"location": "Paris, FR"
}
}
}
]
},
"done_reason": "stop",
"done": true,
"total_duration": 885095291,
"load_duration": 3753500,
"prompt_eval_count": 122,
"prompt_eval_duration": 328493000,
"eval_count": 33,
"eval_duration": 552222000
}
加载模型
若 messages
数组为空,模型将被加载到内存中。
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": []
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2024-09-12T21:17:29.110811Z",
"message": {
"role": "assistant",
"content": ""
},
"done_reason": "load",
"done": true
}
卸载模型
若 messages
数组为空且 keep_alive
设为 0
,模型将从内存中卸载。
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [],
"keep_alive": 0
}'
```
**响应**
返回单一“ JSON ”对象:
```json
{
"model": "llama3.2",
"created_at": "2024-09-12T21:33:17.547535Z",
"message": {
"role": "assistant",
"content": ""
},
"done_reason": "unload",
"done": true
}
第二次翻译:意译
API 概述
支持的接口
技术规范
模型命名规则
模型名称采用 model:tag
格式,其中 model
可包含命名空间(如 example/model
)。例如 orca-mini:3b-q4_1
或 llama3:70b
。标签(tag)非必填,默认值为 latest
,用于指定模型的特定版本。
时间单位
所有时间数据以纳秒为单位返回。
流式输出
部分接口支持以“ JSON ”对象流的方式返回结果。用户可通过设置 {"stream": false}
选择一次性返回完整响应。
生成文本补全
接口:POST /api/generate
根据用户提供的提示和指定模型生成文本。该接口默认采用流式输出,逐步返回多个响应片段,最终响应包含详细的统计信息和请求相关数据。
参数说明
model
:(必填)模型名称,详见模型名称
prompt
:触发响应的文本提示
suffix
:附加在模型输出后的文本
images
:(可选)一组 base64 编码的图像,适用于如 llava
的多模态模型
高级选项(可选)
format
:输出格式,支持“ JSON ”或特定的“ JSON 模式”。
options
:模型运行参数,如 temperature
,具体见 Modelfile。
system
:系统提示,优先于 Modelfile
中的定义。
template
:提示模板,优先于 Modelfile
中的定义。
stream
:设为 false
可一次性返回完整响应。
raw
:设为 true
时,跳过提示格式化,适合直接提供完整模板的情况。
keep_alive
:设置模型在内存中的保留时长,默认5分钟。
context
(已弃用):前次请求的上下文,用于维持简短对话记忆。
结构化输出
通过指定“ JSON 模式”作为 format
参数,可确保输出符合预定义结构。详见下文结构化输出示例。
JSON 模式
通过设置 format
为“ JSON ”,可将响应格式化为规范的“ JSON ”对象。需在 prompt
中明确要求“ JSON ”输出。
[!提示]
若未在提示中指定“ JSON ”,模型可能生成多余空白字符。
示例
流式文本生成
请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?"
}'
```
**响应**
接口返回一系列“ JSON ”对象,逐步呈现生成结果:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"response": "The",
"done": false
}
```
最终响应提供生成过程的详细统计:
- `total_duration`:生成总耗时
- `load_duration`:加载模型所需时间(以纳秒计)
- `prompt_eval_count`:提示文本的令牌数量
- `prompt_eval_duration`:评估提示耗时(纳秒)
- `eval_count`:生成响应的令牌数量
- `eval_duration`:生成响应耗时(纳秒)
- `context`:对话上下文编码,可用于后续请求以保持连贯性
- `response`:流式输出时为空,非流式时包含完整响应
生成速度(令牌/秒)计算公式:`eval_count` / `eval_duration` * `10^9`。
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "",
"done": true,
"context": [1, 2, 3],
"total_duration": 10706818083,
"load_duration": 6338219291,
"prompt_eval_count": 26,
"prompt_eval_duration": 130079000,
"eval_count": 259,
"eval_duration": 4232710000
}
非流式文本生成
请求
通过禁用流式,响应将一次性返回完整结果。
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'
```
**响应**
返回单一“ JSON ”对象,包含完整响应:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
"context": [1, 2, 3],
"total_duration": 5043500667,
"load_duration": 5025959,
"prompt_eval_count": 26,
"prompt_eval_duration": 325953000,
"eval_count": 290,
"eval_duration": 4709213000
}
添加后缀的生成
请求
curl http://localhost:11434/api/generate -d '{
"model": "codellama:code",
"prompt": "def compute_gcd(a, b):",
"suffix": " return result",
"options": {
"temperature": 0
},
"stream": false
}'
```
**响应**
```json5
{
"model": "codellama:code",
"created_at": "2024-07-22T20:47:51.147561Z",
"response": "\n if a == 0:\n return b\n else:\n return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n result = (a * b) / compute_gcd(a, b)\n",
"done": true,
"done_reason": "stop",
"context": [...],
"total_duration": 1162761250,
"load_duration": 6683708,
"prompt_eval_count": 17,
"prompt_eval_duration": 201222000,
"eval_count": 63,
"eval_duration": 953997000
}
结构化输出生成
请求
curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{
"model": "llama3.1:8b",
"prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",
"stream": false,
"format": {
"type": "object",
"properties": {
"age": {
"type": "integer"
},
"available": {
"type": "boolean"
}
},
"required": [
"age",
"available"
]
}
}'
```
**响应**
```json
{
"model": "llama3.1:8b",
"created_at": "2024-12-06T00:48:09.983619Z",
"response": "{\n \"age\": 22,\n \"available\": true\n}",
"done": true,
"done_reason": "stop",
"context": [1, 2, 3],
"total_duration": 1075509083,
"load_duration": 567678166,
"prompt_eval_count": 28,
"prompt_eval_duration": 236000000,
"eval_count": 16,
"eval_duration": 269000000
}
JSON 格式生成
[!提示]
设置 format
为“ JSON ”可确保输出为规范的“ JSON ”对象,需在提示中明确要求“ JSON ”格式。
请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "What color is the sky at different times of the day? Respond using JSON",
"format": "json",
"stream": false
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2023-11-09T21:07:55.186497Z",
"response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",
"done": true,
"context": [1, 2, 3],
"total_duration": 4648158584,
"load_duration": 4071084,
"prompt_eval_count": 36,
"prompt_eval_duration": 439038000,
"eval_count": 180,
"eval_duration": 4196918000
}
```
` response ` 包含的“ JSON ”字符串示例:
```json
{
"morning": {
"color": "blue"
},
"noon": {
"color": "blue-gray"
},
"afternoon": {
"color": "warm gray"
},
"evening": {
"color": "orange"
}
}
带图像的生成
为支持多模态的模型(如 llava
或 bakllava
)提供图像,需提交 base64 编码的图像数组:
请求
curl http://localhost:11434/api/generate -d '{
"model": "llava",
"prompt": "What is in this picture?",
"stream": false,
"images": ["MY_IMAGE"]
}'
```
**响应**
```json
{
"model": "llava",
"created_at": "2023-11-03T15:36:02.583064Z",
"response": "A happy cartoon character, which is cute and cheerful.",
"done": true,
"context": [1, 2, 3],
"total_duration": 2938432250,
"load_duration": 2559292,
"prompt_eval_count": 1,
"prompt_eval_duration": 2195557000,
"eval_count": 44,
"eval_duration": 736432000
}
原始模式生成
若需直接提供完整提示并跳过模板处理,可启用 raw
参数。此模式不返回上下文。
请求
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "[INST] why is the sky blue? [/INST]",
"raw": true,
"stream": false
}'
确保输出一致
通过设置 seed
参数为固定值,可生成一致的输出:
请求
curl http://localhost:11434/api/generate -d '{
"model": "mistral",
"prompt": "Why is the sky blue?",
"options": {
"seed": 123
}
}'
```
**响应**
```json
{
"model": "mistral",
"created_at": "2023-11-03T15:36:02.583064Z",
"response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",
"done": true,
"total_duration": 8493852375,
"load_duration": 6589624375,
"prompt_eval_count": 14,
"prompt_eval_duration": 119039000,
"eval_count": 110,
"eval_duration": 1779061000
}
自定义选项生成
通过 options
参数,可在运行时动态调整模型设置,而无需修改 Modelfile
。以下示例展示了所有可用选项,实际使用时可选择性设置。
请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false,
"options": {
"num_keep": 5,
"seed": 42,
"num_predict": 100,
"top_k": 20,
"top_p": 0.9,
"min_p": 0.0,
"typical_p": 0.7,
"repeat_last_n": 33,
"temperature": 0.8,
"repeat_penalty": 1.2,
"presence_penalty": 1.5,
"frequency_penalty": 1.0,
"mirostat": 1,
"mirostat_tau": 0.8,
"mirostat_eta": 0.6,
"penalize_newline": true,
"stop": ["\n", "user:"],
"numa": false,
"num_ctx": 1024,
"num_batch": 2,
"num_gpu": 1,
"main_gpu": 0,
"low_vram": false,
"vocab_only": false,
"use_mmap": true,
"use_mlock": false,
"num_thread": 8
}
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"response": "The sky is blue because it is the color of the sky.",
"done": true,
"context": [1, 2, 3],
"total_duration": 4935886791,
"load_duration": 534986708,
"prompt_eval_count": 26,
"prompt_eval_duration": 107345000,
"eval_count": 237,
"eval_duration": 4289432000
}
加载模型
若未提供提示,模型将加载到内存中。
请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2"
}'
```
**响应**
返回单一“ JSON ”对象:
```json
{
"model": "llama3.2",
"created_at": "2023-12-18T19:52:07.071755Z",
"response": "",
"done": true
}
卸载模型
若未提供提示且设置 keep_alive
为 0
,模型将从内存中移除。
请求
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"keep_alive": 0
}'
```
**响应**
返回单一“ JSON ”对象:
```json
{
"model": "llama3.2",
"created_at": "2024-09-12T03:54:03.516566Z",
"response": "",
"done": true,
"done_reason": "unload"
}
生成对话回应
接口:POST /api/chat
为对话生成后续回应,基于指定模型处理用户输入。该接口默认采用流式输出,可通过设置“ stream
: false ”改为单次返回。最终响应包括统计数据和请求的附加信息。
参数说明
model
:(必填)模型名称
messages
:对话历史记录,用于保持上下文连贯
tools
:支持的“ JSON ”工具列表,供模型调用(如适用)
消息结构
role
:消息角色,包括 system
、user
、assistant
或 tool
content
:消息文本内容
images
:(可选)嵌入消息的图像,适用于如 llava
的多模态模型
tool_calls
:(可选)模型希望调用的“ JSON ”工具
高级选项(可选)
format
:输出格式,支持“ JSON ”或“ JSON 模式”。
options
:模型运行参数,如 temperature
。
stream
:设为 false
则返回完整响应。
keep_alive
:模型内存保留时间,默认5分钟。
结构化输出
通过指定“ JSON 模式”作为 format
参数,可生成符合预定义结构的响应。详见下文对话请求(结构化输出)示例。
示例
流式对话生成
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
]
}'
```
**响应**
接口返回一连串“ JSON ”对象,逐步构建对话回应:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
"content": "The",
"images": null
},
"done": false
}
```
最终响应总结生成过程:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"message": {
"role": "assistant",
"content": ""
},
"done": true,
"total_duration": 4883583458,
"load_duration": 1334875,
"prompt_eval_count": 26,
"prompt_eval_duration": 342546000,
"eval_count": 282,
"eval_duration": 4535599000
}
非流式对话生成
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
}
],
"stream": false
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
"content": "Hello! How are you today?"
},
"done": true,
"total_duration": 5191566416,
"load_duration": 2154458,
"prompt_eval_count": 26,
"prompt_eval_duration": 383809000,
"eval_count": 298,
"eval_duration": 4799921000
}
结构化输出对话
请求
curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Ollama is 22 years old and busy saving the world. Return a JSON object with the age and availability."}],
"stream": false,
"format": {
"type": "object",
"properties": {
"age": {
"type": "integer"
},
"available": {
"type": "boolean"
}
},
"required": [
"age",
"available"
]
},
"options": {
"temperature": 0
}
}'
```
**响应**
```json
{
"model": "llama3.1",
"created_at": "2024-12-06T00:46:58.265747Z",
"message": { "role": "assistant", "content": "{\"age\": 22, \"available\": false}" },
"done_reason": "stop",
"done": true,
"total_duration": 2254970291,
"load_duration": 574751416,
"prompt_eval_count": 34,
"prompt_eval_duration": 1502000000,
"eval_count": 12,
"eval_duration": 175000000
}
带历史记录的对话
通过包含对话历史,可实现多轮对话或链式推理。
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "why is the sky blue?"
},
{
"role": "assistant",
"content": "due to rayleigh scattering."
},
{
"role": "user",
"content": "how is that different than mie scattering?"
}
]
}'
```
**响应**
返回一系列“ JSON ”对象:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T08:52:19.385406455-07:00",
"message": {
"role": "assistant",
"content": "The"
},
"done": false
}
```
最终响应:
```json
{
"model": "llama3.2",
"created_at": "2023-08-04T19:22:45.499127Z",
"done": true,
"total_duration": 8113331500,
"load_duration": 6396458,
"prompt_eval_count": 61,
"prompt_eval_duration": 398801000,
"eval_count": 468,
"eval_duration": 7701267000
}
带图像的对话
支持包含图像的对话,图像需以 base64 编码格式提供。
请求
curl http://localhost:11434/api/chat -d '{
"model": "llava",
"messages": [
{
"role": "user",
"content": "what is in this image?",
"images": ["MY_IMAGE"]
}
]
}'
```
**响应**
```json
{
"model": "llava",
"created_at": "2023-12-13T22:42:50.203334Z",
"message": {
"role": "assistant",
"content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",
"images": null
},
"done": true,
"total_duration": 1668506709,
"load_duration": 1986209,
"prompt_eval_count": 26,
"prompt_eval_duration": 359682000,
" chiến_count": 83,
"eval_duration": 1303285000
}
确保对话输出一致
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"options": {
"seed": 101,
"temperature": 0
}
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2023-12-12T14:13:43.416799Z",
"message": {
"role": "assistant",
"content": "Hello! How are you today?"
},
"done": true,
"total_duration": 5191566416,
"load_duration": 2154458,
"prompt_eval_count": 26,
"prompt_eval_duration": 383809000,
"eval_count": 298,
"eval_duration": 4799921000
}
使用工具的对话
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "What is the weather today in Paris?"
}
],
"stream": false,
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The location to get the weather for, e.g. San Francisco, CA"
},
"format": {
"type": "string",
"description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location", "format"]
}
}
}
]
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2024-07-22T20:33:28.123648Z",
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "get_current_weather",
"arguments": {
"format": "celsius",
"location": "Paris, FR"
}
}
}
]
},
"done_reason": "stop",
"done": true,
"total_duration": 885095291,
"load_duration": 3753500,
"prompt_eval_count": 122,
"prompt_eval_duration": 328493000,
"eval_count": 33,
"eval_duration": 552222000
}
加载对话模型
若未提供对话消息,模型将加载到内存中。
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": []
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2024-09-12T21:17:29.110811Z",
"message": {
"role": "assistant",
"content": ""
},
"done_reason": "load",
"done": true
}
卸载对话模型
若未提供消息且设置 keep_alive
为 0
,模型将从内存中移除。
请求
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [],
"keep_alive": 0
}'
```
**响应**
```json
{
"model": "llama3.2",
"created_at": "2024-09-12T21:33:17.547535Z",
"message": {
"role": "assistant",
"content": ""
},
"done_reason": "unload",
"done": true
}