Ollama：启动并运行大型语言模型

jetsung

Ollama：启动并运行大型语言模型

https://github.com/ollama/ollama
https://ollama.com/

jetsung

安装

curl -fsSL https://ollama.com/install.sh | sh

jetsung

直译

API

端点

生成补全
生成对话补全
创建模型
列出本地模型
显示模型信息
复制模型
删除模型
拉取模型
推送模型
生成嵌入
列出运行中的模型
版本

规范

模型名称

模型名称遵循 model:tag 格式，其中 model 可包含可选命名空间，如 example/model。示例包括 orca-mini:3b-q4_1 和 llama3:70b。标签（tag）为可选，若未提供，默认使用 latest。标签用于标识特定版本。

时长

所有时长以纳秒返回。

流式响应

某些端点以“ JSON ”对象流形式返回响应。可通过设置 {"stream": false} 禁用流式。

生成补全

端点：POST /api/generate
使用指定模型为给定的提示生成响应。此为流式端点，将返回一系列响应。最终响应对象包含请求的统计数据及附加信息。
参数

model：（必需）模型名称
prompt：用于生成响应的提示文本
suffix：模型响应后的附加文本
images：（可选）一组以 base64 编码的图像（适用于多模态模型，如 llava）
高级参数（可选）
format：响应格式，可为“ JSON ”或“ JSON 模式”。
options：模型参数，如 temperature，详见 Modelfile 文档。
system：系统消息（覆盖 Modelfile 中定义的内容）。
template：使用的提示模板（覆盖 Modelfile 中定义的内容）。
stream：若设为 false，响应以单一对象返回，而非对象流。
raw：若为 true，不对提示应用格式化。适用于请求中指定完整模板化提示的情况。
keep_alive：控制模型在请求后保留在内存中的时间（默认：5分钟）。
context（已弃用）：前次 /generate 请求返回的上下文参数，用于保持短暂对话记忆。

结构化输出

通过在 format 参数中提供“ JSON 模式”支持结构化输出，模型将生成符合该模式的响应。参见下文结构化输出示例。

JSON 模式

通过设置 format 参数为“ JSON ”启用“ JSON 模式”，响应将结构化为有效的“ JSON ”对象。参见下文JSON 模式示例。

[!重要]
必须在 prompt 中指示模型使用“ JSON ”，否则模型可能生成大量空白。

示例

流式生成请求

请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "Why is the sky blue?"  
}'  
```  
**响应**  
返回一组“ JSON ”对象流：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T08:52:19.385406455-07:00",  
  "response": "The",  
  "done": false  
}  
```  
流中的最终响应包含生成统计数据：  
- `total_duration`：生成响应耗时  
- `load_duration`：加载模型耗时（纳秒）  
- `prompt_eval_count`：提示中的令牌数  
- `prompt_eval_duration`：评估提示耗时（纳秒）  
- `eval_count`：响应中的令牌数  
- `eval_duration`：生成响应耗时（纳秒）  
- `context`：本次响应使用的对话编码，可在下次请求中发送以保持对话记忆  
- `response`：若为流式则为空，若非流式则包含完整响应  

计算响应生成速度（令牌/秒）的方法：`eval_count` / `eval_duration` * `10^9`。  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "response": "",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 10706818083,  
  "load_duration": 6338219291,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 130079000,  
  "eval_count": 259,  
  "eval_duration": 4232710000  
}

非流式请求

请求
禁用流式后，响应将一次性返回。

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "Why is the sky blue?",  
  "stream": false  
}'  
```  
**响应**  
若 ` stream ` 设为 ` false `，返回单一“ JSON ”对象：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "response": "The sky is blue because it is the color of the sky.",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 5043500667,  
  "load_duration": 5025959,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 325953000,  
  "eval_count": 290,  
  "eval_duration": 4709213000  
}

带后缀的请求

请求

curl http://localhost:11434/api/generate -d '{  
  "model": "codellama:code",  
  "prompt": "def compute_gcd(a, b):",  
  "suffix": "    return result",  
  "options": {  
    "temperature": 0  
  },  
  "stream": false  
}'  
```  
**响应**  
```json5  
{  
  "model": "codellama:code",  
  "created_at": "2024-07-22T20:47:51.147561Z",  
  "response": "\n  if a == 0:\n    return b\n  else:\n    return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n  result = (a * b) / compute_gcd(a, b)\n",  
  "done": true,  
  "done_reason": "stop",  
  "context": [...],  
  "total_duration": 1162761250,  
  "load_duration": 6683708,  
  "prompt_eval_count": 17,  
  "prompt_eval_duration": 201222000,  
  "eval_count": 63,  
  "eval_duration": 953997000  
}

结构化输出请求

请求

curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{  
  "model": "llama3.1:8b",  
  "prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",  
  "stream": false,  
  "format": {  
    "type": "object",  
    "properties": {  
      "age": {  
        "type": "integer"  
      },  
      "available": {  
        "type": "boolean"  
      }  
    },  
    "required": [  
      "age",  
      "available"  
    ]  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.1:8b",  
  "created_at": "2024-12-06T00:48:09.983619Z",  
  "response": "{\n  \"age\": 22,\n  \"available\": true\n}",  
  "done": true,  
  "done_reason": "stop",  
  "context": [1, 2, 3],  
  "total_duration": 1075509083,  
  "load_duration": 567678166,  
  "prompt_eval_count": 28,  
  "prompt_eval_duration": 236000000,  
  "eval_count": 16,  
  "eval_duration": 269000000  
}

JSON 模式请求

[!重要]
当 format 设为“ JSON ”时，输出始终为格式正确的“ JSON ”对象。必须在 prompt 中指示模型使用“ JSON ”。

请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "What color is the sky at different times of the day? Respond using JSON",  
  "format": "json",  
  "stream": false  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-11-09T21:07:55.186497Z",  
  "response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 4648158584,  
  "load_duration": 4071084,  
  "prompt_eval_count": 36,  
  "prompt_eval_duration": 439038000,  
  "eval_count": 180,  
  "eval_duration": 4196918000  
}  
```  
` response ` 的值是一个包含类似以下内容的“ JSON ”字符串：  
```json  
{  
  "morning": {  
    "color": "blue"  
  },  
  "noon": {  
    "color": "blue-gray"  
  },  
  "afternoon": {  
    "color": "warm gray"  
  },  
  "evening": {  
    "color": "orange"  
  }  
}

带图像的请求

为多模态模型（如 llava 或 bakllava）提交图像，需提供一组 base64 编码的 images：
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llava",  
  "prompt": "What is in this picture?",  
  "stream": false,  
  "images": ["MY_IMAGE"]  
}'  
```  
**响应**  
```json  
{  
  "model": "llava",  
  "created_at": "2023-11-03T15:36:02.583064Z",  
  "response": "A happy cartoon character, which is cute and cheerful.",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 2938432250,  
  "load_duration": 2559292,  
  "prompt_eval_count": 1,  
  "prompt_eval_duration": 2195557000,  
  "eval_count": 44,  
  "eval_duration": 736432000  
}

原始模式请求

若需绕过模板系统直接提供完整提示，可使用 raw 参数禁用模板化。注意，原始模式不返回上下文。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "mistral",  
  "prompt": "[INST] why is the sky blue? [/INST]",  
  "raw": true,  
  "stream": false  
}'

可重现输出请求

为确保输出可重现，可设置 seed 为特定数字：
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "mistral",  
  "prompt": "Why is the sky blue?",  
  "options": {  
    "seed": 123  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "mistral",  
  "created_at": "2023-11-03T15:36:02.583064Z",  
  "response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",  
  "done": true,  
  "total_duration": 8493852375,  
  "load_duration": 6589624375,  
  "prompt_eval_count": 14,  
  "prompt_eval_duration": 119039000,  
  "eval_count": 110,  
  "eval_duration": 1779061000  
}

带选项的生成请求

若需在运行时设置自定义模型选项而非在 Modelfile 中定义，可使用 options 参数。此示例设置所有可用选项，但可单独设置部分选项并忽略不需要覆盖的选项。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "Why is the sky blue?",  
  "stream": false,  
  "options": {  
    "num_keep": 5,  
    "seed": 42,  
    "num_predict": 100,  
    "top_k": 20,  
    "top_p": 0.9,  
    "min_p": 0.0,  
    "typical_p": 0.7,  
    "repeat_last_n": 33,  
    "temperature": 0.8,  
    "repeat_penalty": 1.2,  
    "presence_penalty": 1.5,  
    "frequency_penalty": 1.0,  
    "mirostat": 1,  
    "mirostat_tau": 0.8,  
    "mirostat_eta": 0.6,  
    "penalize_newline": true,  
    "stop": ["\n", "user:"],  
    "numa": false,  
    "num_ctx": 1024,  
    "num_batch": 2,  
    "num_gpu": 1,  
    "main_gpu": 0,  
    "low_vram": false,  
    "vocab_only": false,  
    "use_mmap": true,  
    "use_mlock": false,  
    "num_thread": 8  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "response": "The sky is blue because it is the color of the sky.",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 4935886791,  
  "load_duration": 534986708,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 107345000,  
  "eval_count": 237,  
  "eval_duration": 4289432000  
}

加载模型

若提供空提示，模型将被加载到内存中。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2"  
}'  
```  
**响应**  
返回单一“ JSON ”对象：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-12-18T19:52:07.071755Z",  
  "response": "",  
  "done": true  
}

卸载模型

若提供空提示且 keep_alive 设为 0，模型将从内存中卸载。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "keep_alive": 0  
}'  
```  
**响应**  
返回单一“ JSON ”对象：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-09-12T03:54:03.516566Z",  
  "response": "",  
  "done": true,  
  "done_reason": "unload"  
}

生成对话补全

端点：POST /api/chat
使用指定模型生成对话中的下一条消息。此为流式端点，将返回一系列响应。可通过设置“ stream : false ”禁用流式。最终响应对象包含统计数据及请求的附加信息。
参数

model：（必需）模型名称
messages：对话消息列表，用于保留对话记忆
tools：模型可用的“ JSON ”工具列表（若支持）
消息对象字段
role：消息角色，可为 system、user、assistant 或 tool
content：消息内容
images：（可选）包含在消息中的图像列表（适用于多模态模型，如 llava）
tool_calls：（可选）模型希望使用的“ JSON ”工具列表
高级参数（可选）
format：响应格式，可为“ JSON ”或“ JSON 模式”。
options：模型参数，如 temperature，详见 Modelfile。
stream：若为 false，响应以单一对象返回。
keep_alive：控制模型在请求后保留在内存中的时间（默认：5分钟）。

结构化输出

通过在 format 参数中提供“ JSON 模式”支持结构化输出，模型将生成符合该模式的响应。参见下文对话请求（结构化输出）示例。

示例

流式对话请求

请求
发送带流式响应的对话消息。

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "why is the sky blue?"  
    }  
  ]  
}'  
```  
**响应**  
返回一组“ JSON ”对象流：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T08:52:19.385406455-07:00",  
  "message": {  
    "role": "assistant",  
    "content": "The",  
    "images": null  
  },  
  "done": false  
}  
```  
最终响应：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "message": {  
    "role": "assistant",  
    "content": ""  
  },  
  "done": true,  
  "total_duration": 4883583458,  
  "load_duration": 1334875,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 342546000,  
  "eval_count": 282,  
  "eval_duration": 4535599000  
}

非流式对话请求

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "why is the sky blue?"  
    }  
  ],  
  "stream": false  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-12-12T14:13:43.416799Z",  
  "message": {  
    "role": "assistant",  
    "content": "Hello! How are you today?"  
  },  
  "done": true,  
  "total_duration": 5191566416,  
  "load_duration": 2154458,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 383809000,  
  "eval_count": 298,  
  "eval_duration": 4799921000  
}

结构化输出对话请求

请求

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{  
  "model": "llama3.1",  
  "messages": [{"role": "user", "content": "Ollama is 22 years old and busy saving the world. Return a JSON object with the age and availability."}],  
  "stream": false,  
  "format": {  
    "type": "object",  
    "properties": {  
      "age": {  
        "type": "integer"  
      },  
      "available": {  
        "type": "boolean"  
      }  
    },  
    "required": [  
      "age",  
      "available"  
    ]  
  },  
  "options": {  
    "temperature": 0  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.1",  
  "created_at": "2024-12-06T00:46:58.265747Z",  
  "message": { "role": "assistant", "content": "{\"age\": 22, \"available\": false}" },  
  "done_reason": "stop",  
  "done": true,  
  "total_duration": 2254970291,  
  "load_duration": 574751416,  
  "prompt_eval_count": 34,  
  "prompt_eval_duration": 1502000000,  
  "eval_count": 12,  
  "eval_duration": 175000000  
}

带历史记录的对话请求

发送包含对话历史的对话消息。可用相同方法通过多轮或链式思考提示开始对话。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "why is the sky blue?"  
    },  
    {  
      "role": "assistant",  
      "content": "due to rayleigh scattering."  
    },  
    {  
      "role": "user",  
      "content": "how is that different than mie scattering?"  
    }  
  ]  
}'  
```  
**响应**  
返回一组“ JSON ”对象流：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T08:52:19.385406455-07:00",  
  "message": {  
    "role": "assistant",  
    "content": "The"  
  },  
  "done": false  
}  
```  
最终响应：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "done": true,  
  "total_duration": 8113331500,  
  "load_duration": 6396458,  
  "prompt_eval_count": 61,  
  "prompt_eval_duration": 398801000,  
  "eval_count": 468,  
  "eval_duration": 7701267000  
}

带图像的对话请求

发送包含图像的对话消息。图像应以数组形式提供，单个图像以 base64 编码。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llava",  
  "messages": [  
    {  
      "role": "user",  
      "content": "what is in this image?",  
      "images": ["MY_IMAGE"]  
    }  
  ]  
}'  
```  
**响应**  
```json  
{  
  "model": "llava",  
  "created_at": "2023-12-13T22:42:50.203334Z",  
  "message": {  
    "role": "assistant",  
    "content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",  
    "images": null  
  },  
  "done": true,  
  "total_duration": 1668506709,  
  "load_duration": 1986209,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 359682000,  
  "eval_count": 83,  
  "eval_duration": 1303285000  
}

可重现输出对话请求

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "Hello!"  
    }  
  ],  
  "options": {  
    "seed": 101,  
    "temperature": 0  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-12-12T14:13:43.416799Z",  
  "message": {  
    "role": "assistant",  
    "content": "Hello! How are you today?"  
  },  
  "done": true,  
  "total_duration": 5191566416,  
  "load_duration": 2154458,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 383809000,  
  "eval_count": 298,  
  "eval_duration": 4799921000  
}

带工具的对话请求

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "What is the weather today in Paris?"  
    }  
  ],  
  "stream": false,  
  "tools": [  
    {  
      "type": "function",  
      "function": {  
        "name": "get_current_weather",  
        "description": "Get the current weather for a location",  
        "parameters": {  
          "type": "object",  
          "properties": {  
            "location": {  
              "type": "string",  
              "description": "The location to get the weather for, e.g. San Francisco, CA"  
            },  
            "format": {  
              "type": "string",  
              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",  
              "enum": ["celsius", "fahrenheit"]  
            }  
          },  
          "required": ["location", "format"]  
        }  
      }  
    }  
  ]  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-07-22T20:33:28.123648Z",  
  "message": {  
    "role": "assistant",  
    "content": "",  
    "tool_calls": [  
      {  
        "function": {  
          "name": "get_current_weather",  
          "arguments": {  
            "format": "celsius",  
            "location": "Paris, FR"  
          }  
        }  
      }  
    ]  
  },  
  "done_reason": "stop",  
  "done": true,  
  "total_duration": 885095291,  
  "load_duration": 3753500,  
  "prompt_eval_count": 122,  
  "prompt_eval_duration": 328493000,  
  "eval_count": 33,  
  "eval_duration": 552222000  
}

加载模型

若 messages 数组为空，模型将被加载到内存中。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": []  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-09-12T21:17:29.110811Z",  
  "message": {  
    "role": "assistant",  
    "content": ""  
  },  
  "done_reason": "load",  
  "done": true  
}

卸载模型

若 messages 数组为空且 keep_alive 设为 0，模型将从内存中卸载。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [],  
  "keep_alive": 0  
}'  
```  
**响应**  
返回单一“ JSON ”对象：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-09-12T21:33:17.547535Z",  
  "message": {  
    "role": "assistant",  
    "content": ""  
  },  
  "done_reason": "unload",  
  "done": true  
}

第二次翻译：意译

API 概述

技术规范

模型命名规则

模型名称采用 model:tag 格式，其中 model 可包含命名空间（如 example/model）。例如 orca-mini:3b-q4_1 或 llama3:70b。标签（tag）非必填，默认值为 latest，用于指定模型的特定版本。

时间单位

所有时间数据以纳秒为单位返回。

流式输出

部分接口支持以“ JSON ”对象流的方式返回结果。用户可通过设置 {"stream": false} 选择一次性返回完整响应。

生成文本补全

接口：POST /api/generate
根据用户提供的提示和指定模型生成文本。该接口默认采用流式输出，逐步返回多个响应片段，最终响应包含详细的统计信息和请求相关数据。
参数说明

model：（必填）模型名称，详见模型名称
prompt：触发响应的文本提示
suffix：附加在模型输出后的文本
images：（可选）一组 base64 编码的图像，适用于如 llava 的多模态模型
高级选项（可选）
format：输出格式，支持“ JSON ”或特定的“ JSON 模式”。
options：模型运行参数，如 temperature，具体见 Modelfile。
system：系统提示，优先于 Modelfile 中的定义。
template：提示模板，优先于 Modelfile 中的定义。
stream：设为 false 可一次性返回完整响应。
raw：设为 true 时，跳过提示格式化，适合直接提供完整模板的情况。
keep_alive：设置模型在内存中的保留时长，默认5分钟。
context（已弃用）：前次请求的上下文，用于维持简短对话记忆。

结构化输出

通过指定“ JSON 模式”作为 format 参数，可确保输出符合预定义结构。详见下文结构化输出示例。

JSON 模式

通过设置 format 为“ JSON ”，可将响应格式化为规范的“ JSON ”对象。需在 prompt 中明确要求“ JSON ”输出。

[!提示]
若未在提示中指定“ JSON ”，模型可能生成多余空白字符。

示例

流式文本生成

请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "Why is the sky blue?"  
}'  
```  
**响应**  
接口返回一系列“ JSON ”对象，逐步呈现生成结果：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T08:52:19.385406455-07:00",  
  "response": "The",  
  "done": false  
}  
```  
最终响应提供生成过程的详细统计：  
- `total_duration`：生成总耗时  
- `load_duration`：加载模型所需时间（以纳秒计）  
- `prompt_eval_count`：提示文本的令牌数量  
- `prompt_eval_duration`：评估提示耗时（纳秒）  
- `eval_count`：生成响应的令牌数量  
- `eval_duration`：生成响应耗时（纳秒）  
- `context`：对话上下文编码，可用于后续请求以保持连贯性  
- `response`：流式输出时为空，非流式时包含完整响应  

生成速度（令牌/秒）计算公式：`eval_count` / `eval_duration` * `10^9`。  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "response": "",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 10706818083,  
  "load_duration": 6338219291,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 130079000,  
  "eval_count": 259,  
  "eval_duration": 4232710000  
}

非流式文本生成

请求
通过禁用流式，响应将一次性返回完整结果。

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "Why is the sky blue?",  
  "stream": false  
}'  
```  
**响应**  
返回单一“ JSON ”对象，包含完整响应：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "response": "The sky is blue because it is the color of the sky.",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 5043500667,  
  "load_duration": 5025959,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 325953000,  
  "eval_count": 290,  
  "eval_duration": 4709213000  
}

添加后缀的生成

请求

curl http://localhost:11434/api/generate -d '{  
  "model": "codellama:code",  
  "prompt": "def compute_gcd(a, b):",  
  "suffix": "    return result",  
  "options": {  
    "temperature": 0  
  },  
  "stream": false  
}'  
```  
**响应**  
```json5  
{  
  "model": "codellama:code",  
  "created_at": "2024-07-22T20:47:51.147561Z",  
  "response": "\n  if a == 0:\n    return b\n  else:\n    return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n  result = (a * b) / compute_gcd(a, b)\n",  
  "done": true,  
  "done_reason": "stop",  
  "context": [...],  
  "total_duration": 1162761250,  
  "load_duration": 6683708,  
  "prompt_eval_count": 17,  
  "prompt_eval_duration": 201222000,  
  "eval_count": 63,  
  "eval_duration": 953997000  
}

结构化输出生成

请求

curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{  
  "model": "llama3.1:8b",  
  "prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",  
  "stream": false,  
  "format": {  
    "type": "object",  
    "properties": {  
      "age": {  
        "type": "integer"  
      },  
      "available": {  
        "type": "boolean"  
      }  
    },  
    "required": [  
      "age",  
      "available"  
    ]  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.1:8b",  
  "created_at": "2024-12-06T00:48:09.983619Z",  
  "response": "{\n  \"age\": 22,\n  \"available\": true\n}",  
  "done": true,  
  "done_reason": "stop",  
  "context": [1, 2, 3],  
  "total_duration": 1075509083,  
  "load_duration": 567678166,  
  "prompt_eval_count": 28,  
  "prompt_eval_duration": 236000000,  
  "eval_count": 16,  
  "eval_duration": 269000000  
}

JSON 格式生成

[!提示]
设置 format 为“ JSON ”可确保输出为规范的“ JSON ”对象，需在提示中明确要求“ JSON ”格式。

请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "What color is the sky at different times of the day? Respond using JSON",  
  "format": "json",  
  "stream": false  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-11-09T21:07:55.186497Z",  
  "response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 4648158584,  
  "load_duration": 4071084,  
  "prompt_eval_count": 36,  
  "prompt_eval_duration": 439038000,  
  "eval_count": 180,  
  "eval_duration": 4196918000  
}  
```  
` response ` 包含的“ JSON ”字符串示例：  
```json  
{  
  "morning": {  
    "color": "blue"  
  },  
  "noon": {  
    "color": "blue-gray"  
  },  
  "afternoon": {  
    "color": "warm gray"  
  },  
  "evening": {  
    "color": "orange"  
  }  
}

带图像的生成

为支持多模态的模型（如 llava 或 bakllava）提供图像，需提交 base64 编码的图像数组：
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llava",  
  "prompt": "What is in this picture?",  
  "stream": false,  
  "images": ["MY_IMAGE"]  
}'  
```  
**响应**  
```json  
{  
  "model": "llava",  
  "created_at": "2023-11-03T15:36:02.583064Z",  
  "response": "A happy cartoon character, which is cute and cheerful.",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 2938432250,  
  "load_duration": 2559292,  
  "prompt_eval_count": 1,  
  "prompt_eval_duration": 2195557000,  
  "eval_count": 44,  
  "eval_duration": 736432000  
}

原始模式生成

若需直接提供完整提示并跳过模板处理，可启用 raw 参数。此模式不返回上下文。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "mistral",  
  "prompt": "[INST] why is the sky blue? [/INST]",  
  "raw": true,  
  "stream": false  
}'

确保输出一致

通过设置 seed 参数为固定值，可生成一致的输出：
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "mistral",  
  "prompt": "Why is the sky blue?",  
  "options": {  
    "seed": 123  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "mistral",  
  "created_at": "2023-11-03T15:36:02.583064Z",  
  "response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",  
  "done": true,  
  "total_duration": 8493852375,  
  "load_duration": 6589624375,  
  "prompt_eval_count": 14,  
  "prompt_eval_duration": 119039000,  
  "eval_count": 110,  
  "eval_duration": 1779061000  
}

自定义选项生成

通过 options 参数，可在运行时动态调整模型设置，而无需修改 Modelfile。以下示例展示了所有可用选项，实际使用时可选择性设置。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "Why is the sky blue?",  
  "stream": false,  
  "options": {  
    "num_keep": 5,  
    "seed": 42,  
    "num_predict": 100,  
    "top_k": 20,  
    "top_p": 0.9,  
    "min_p": 0.0,  
    "typical_p": 0.7,  
    "repeat_last_n": 33,  
    "temperature": 0.8,  
    "repeat_penalty": 1.2,  
    "presence_penalty": 1.5,  
    "frequency_penalty": 1.0,  
    "mirostat": 1,  
    "mirostat_tau": 0.8,  
    "mirostat_eta": 0.6,  
    "penalize_newline": true,  
    "stop": ["\n", "user:"],  
    "numa": false,  
    "num_ctx": 1024,  
    "num_batch": 2,  
    "num_gpu": 1,  
    "main_gpu": 0,  
    "low_vram": false,  
    "vocab_only": false,  
    "use_mmap": true,  
    "use_mlock": false,  
    "num_thread": 8  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "response": "The sky is blue because it is the color of the sky.",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 4935886791,  
  "load_duration": 534986708,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 107345000,  
  "eval_count": 237,  
  "eval_duration": 4289432000  
}

加载模型

若未提供提示，模型将加载到内存中。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2"  
}'  
```  
**响应**  
返回单一“ JSON ”对象：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-12-18T19:52:07.071755Z",  
  "response": "",  
  "done": true  
}

卸载模型

若未提供提示且设置 keep_alive 为 0，模型将从内存中移除。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "keep_alive": 0  
}'  
```  
**响应**  
返回单一“ JSON ”对象：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-09-12T03:54:03.516566Z",  
  "response": "",  
  "done": true,  
  "done_reason": "unload"  
}

生成对话回应

接口：POST /api/chat
为对话生成后续回应，基于指定模型处理用户输入。该接口默认采用流式输出，可通过设置“ stream : false ”改为单次返回。最终响应包括统计数据和请求的附加信息。
参数说明

model：（必填）模型名称
messages：对话历史记录，用于保持上下文连贯
tools：支持的“ JSON ”工具列表，供模型调用（如适用）
消息结构
role：消息角色，包括 system、user、assistant 或 tool
content：消息文本内容
images：（可选）嵌入消息的图像，适用于如 llava 的多模态模型
tool_calls：（可选）模型希望调用的“ JSON ”工具
高级选项（可选）
format：输出格式，支持“ JSON ”或“ JSON 模式”。
options：模型运行参数，如 temperature。
stream：设为 false 则返回完整响应。
keep_alive：模型内存保留时间，默认5分钟。

结构化输出

通过指定“ JSON 模式”作为 format 参数，可生成符合预定义结构的响应。详见下文对话请求（结构化输出）示例。

示例

流式对话生成

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "why is the sky blue?"  
    }  
  ]  
}'  
```  
**响应**  
接口返回一连串“ JSON ”对象，逐步构建对话回应：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T08:52:19.385406455-07:00",  
  "message": {  
    "role": "assistant",  
    "content": "The",  
    "images": null  
  },  
  "done": false  
}  
```  
最终响应总结生成过程：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "message": {  
    "role": "assistant",  
    "content": ""  
  },  
  "done": true,  
  "total_duration": 4883583458,  
  "load_duration": 1334875,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 342546000,  
  "eval_count": 282,  
  "eval_duration": 4535599000  
}

非流式对话生成

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "why is the sky blue?"  
    }  
  ],  
  "stream": false  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-12-12T14:13:43.416799Z",  
  "message": {  
    "role": "assistant",  
    "content": "Hello! How are you today?"  
  },  
  "done": true,  
  "total_duration": 5191566416,  
  "load_duration": 2154458,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 383809000,  
  "eval_count": 298,  
  "eval_duration": 4799921000  
}

结构化输出对话

请求

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{  
  "model": "llama3.1",  
  "messages": [{"role": "user", "content": "Ollama is 22 years old and busy saving the world. Return a JSON object with the age and availability."}],  
  "stream": false,  
  "format": {  
    "type": "object",  
    "properties": {  
      "age": {  
        "type": "integer"  
      },  
      "available": {  
        "type": "boolean"  
      }  
    },  
    "required": [  
      "age",  
      "available"  
    ]  
  },  
  "options": {  
    "temperature": 0  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.1",  
  "created_at": "2024-12-06T00:46:58.265747Z",  
  "message": { "role": "assistant", "content": "{\"age\": 22, \"available\": false}" },  
  "done_reason": "stop",  
  "done": true,  
  "total_duration": 2254970291,  
  "load_duration": 574751416,  
  "prompt_eval_count": 34,  
  "prompt_eval_duration": 1502000000,  
  "eval_count": 12,  
  "eval_duration": 175000000  
}

带历史记录的对话

通过包含对话历史，可实现多轮对话或链式推理。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "why is the sky blue?"  
    },  
    {  
      "role": "assistant",  
      "content": "due to rayleigh scattering."  
    },  
    {  
      "role": "user",  
      "content": "how is that different than mie scattering?"  
    }  
  ]  
}'  
```  
**响应**  
返回一系列“ JSON ”对象：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T08:52:19.385406455-07:00",  
  "message": {  
    "role": "assistant",  
    "content": "The"  
  },  
  "done": false  
}  
```  
最终响应：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "done": true,  
  "total_duration": 8113331500,  
  "load_duration": 6396458,  
  "prompt_eval_count": 61,  
  "prompt_eval_duration": 398801000,  
  "eval_count": 468,  
  "eval_duration": 7701267000  
}

带图像的对话

支持包含图像的对话，图像需以 base64 编码格式提供。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llava",  
  "messages": [  
    {  
      "role": "user",  
      "content": "what is in this image?",  
      "images": ["MY_IMAGE"]  
    }  
  ]  
}'  
```  
**响应**  
```json  
{  
  "model": "llava",  
  "created_at": "2023-12-13T22:42:50.203334Z",  
  "message": {  
    "role": "assistant",  
    "content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",  
    "images": null  
  },  
  "done": true,  
  "total_duration": 1668506709,  
  "load_duration": 1986209,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 359682000,  
  " chiến_count": 83,  
  "eval_duration": 1303285000  
}

确保对话输出一致

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "Hello!"  
    }  
  ],  
  "options": {  
    "seed": 101,  
    "temperature": 0  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-12-12T14:13:43.416799Z",  
  "message": {  
    "role": "assistant",  
    "content": "Hello! How are you today?"  
  },  
  "done": true,  
  "total_duration": 5191566416,  
  "load_duration": 2154458,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 383809000,  
  "eval_count": 298,  
  "eval_duration": 4799921000  
}

使用工具的对话

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "What is the weather today in Paris?"  
    }  
  ],  
  "stream": false,  
  "tools": [  
    {  
      "type": "function",  
      "function": {  
        "name": "get_current_weather",  
        "description": "Get the current weather for a location",  
        "parameters": {  
          "type": "object",  
          "properties": {  
            "location": {  
              "type": "string",  
              "description": "The location to get the weather for, e.g. San Francisco, CA"  
            },  
            "format": {  
              "type": "string",  
              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",  
              "enum": ["celsius", "fahrenheit"]  
            }  
          },  
          "required": ["location", "format"]  
        }  
      }  
    }  
  ]  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-07-22T20:33:28.123648Z",  
  "message": {  
    "role": "assistant",  
    "content": "",  
    "tool_calls": [  
      {  
        "function": {  
          "name": "get_current_weather",  
          "arguments": {  
            "format": "celsius",  
            "location": "Paris, FR"  
          }  
        }  
      }  
    ]  
  },  
  "done_reason": "stop",  
  "done": true,  
  "total_duration": 885095291,  
  "load_duration": 3753500,  
  "prompt_eval_count": 122,  
  "prompt_eval_duration": 328493000,  
  "eval_count": 33,  
  "eval_duration": 552222000  
}

加载对话模型

若未提供对话消息，模型将加载到内存中。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": []  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-09-12T21:17:29.110811Z",  
  "message": {  
    "role": "assistant",  
    "content": ""  
  },  
  "done_reason": "load",  
  "done": true  
}

卸载对话模型

若未提供消息且设置 keep_alive 为 0，模型将从内存中移除。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [],  
  "keep_alive": 0  
}'  
```  

**响应**  

```json  

{  

  "model": "llama3.2",  

  "created_at": "2024-09-12T21:33:17.547535Z",  

  "message": {  

    "role": "assistant",  

    "content": ""  

  },  

  "done_reason": "unload",  

  "done": true  

}

jetsung

意译

API 概述

支持的接口

生成文本补全
生成对话回应
创建新模型
查看本地模型
获取模型详情
复制模型
删除模型
下载模型
上传模型
生成文本嵌入
列出运行中的模型
查询版本

技术规范

模型命名规则

时间单位

所有时间数据以纳秒为单位返回。

流式输出

部分接口支持以“ JSON ”对象流的方式返回结果。用户可通过设置 {"stream": false} 选择一次性返回完整响应。

生成文本补全

model：（必填）模型名称，详见模型名称
prompt：触发响应的文本提示
suffix：附加在模型输出后的文本
images：（可选）一组 base64 编码的图像，适用于如 llava 的多模态模型
高级选项（可选）
format：输出格式，支持“ JSON ”或特定的“ JSON 模式”。
options：模型运行参数，如 temperature，具体见 Modelfile。
system：系统提示，优先于 Modelfile 中的定义。
template：提示模板，优先于 Modelfile 中的定义。
stream：设为 false 可一次性返回完整响应。
raw：设为 true 时，跳过提示格式化，适合直接提供完整模板的情况。
keep_alive：设置模型在内存中的保留时长，默认5分钟。
context（已弃用）：前次请求的上下文，用于维持简短对话记忆。

结构化输出

通过指定“ JSON 模式”作为 format 参数，可确保输出符合预定义结构。详见下文结构化输出示例。

JSON 模式

通过设置 format 为“ JSON ”，可将响应格式化为规范的“ JSON ”对象。需在 prompt 中明确要求“ JSON ”输出。

[!提示]
若未在提示中指定“ JSON ”，模型可能生成多余空白字符。

示例

流式文本生成

请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "Why is the sky blue?"  
}'  
```  
**响应**  
接口返回一系列“ JSON ”对象，逐步呈现生成结果：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T08:52:19.385406455-07:00",  
  "response": "The",  
  "done": false  
}  
```  
最终响应提供生成过程的详细统计：  
- `total_duration`：生成总耗时  
- `load_duration`：加载模型所需时间（以纳秒计）  
- `prompt_eval_count`：提示文本的令牌数量  
- `prompt_eval_duration`：评估提示耗时（纳秒）  
- `eval_count`：生成响应的令牌数量  
- `eval_duration`：生成响应耗时（纳秒）  
- `context`：对话上下文编码，可用于后续请求以保持连贯性  
- `response`：流式输出时为空，非流式时包含完整响应  

生成速度（令牌/秒）计算公式：`eval_count` / `eval_duration` * `10^9`。  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "response": "",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 10706818083,  
  "load_duration": 6338219291,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 130079000,  
  "eval_count": 259,  
  "eval_duration": 4232710000  
}

非流式文本生成

请求
通过禁用流式，响应将一次性返回完整结果。

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "Why is the sky blue?",  
  "stream": false  
}'  
```  
**响应**  
返回单一“ JSON ”对象，包含完整响应：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "response": "The sky is blue because it is the color of the sky.",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 5043500667,  
  "load_duration": 5025959,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 325953000,  
  "eval_count": 290,  
  "eval_duration": 4709213000  
}

添加后缀的生成

请求

curl http://localhost:11434/api/generate -d '{  
  "model": "codellama:code",  
  "prompt": "def compute_gcd(a, b):",  
  "suffix": "    return result",  
  "options": {  
    "temperature": 0  
  },  
  "stream": false  
}'  
```  
**响应**  
```json5  
{  
  "model": "codellama:code",  
  "created_at": "2024-07-22T20:47:51.147561Z",  
  "response": "\n  if a == 0:\n    return b\n  else:\n    return compute_gcd(b % a, a)\n\ndef compute_lcm(a, b):\n  result = (a * b) / compute_gcd(a, b)\n",  
  "done": true,  
  "done_reason": "stop",  
  "context": [...],  
  "total_duration": 1162761250,  
  "load_duration": 6683708,  
  "prompt_eval_count": 17,  
  "prompt_eval_duration": 201222000,  
  "eval_count": 63,  
  "eval_duration": 953997000  
}

结构化输出生成

请求

curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{  
  "model": "llama3.1:8b",  
  "prompt": "Ollama is 22 years old and is busy saving the world. Respond using JSON",  
  "stream": false,  
  "format": {  
    "type": "object",  
    "properties": {  
      "age": {  
        "type": "integer"  
      },  
      "available": {  
        "type": "boolean"  
      }  
    },  
    "required": [  
      "age",  
      "available"  
    ]  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.1:8b",  
  "created_at": "2024-12-06T00:48:09.983619Z",  
  "response": "{\n  \"age\": 22,\n  \"available\": true\n}",  
  "done": true,  
  "done_reason": "stop",  
  "context": [1, 2, 3],  
  "total_duration": 1075509083,  
  "load_duration": 567678166,  
  "prompt_eval_count": 28,  
  "prompt_eval_duration": 236000000,  
  "eval_count": 16,  
  "eval_duration": 269000000  
}

JSON 格式生成

[!提示]
设置 format 为“ JSON ”可确保输出为规范的“ JSON ”对象，需在提示中明确要求“ JSON ”格式。

请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "What color is the sky at different times of the day? Respond using JSON",  
  "format": "json",  
  "stream": false  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-11-09T21:07:55.186497Z",  
  "response": "{\n\"morning\": {\n\"color\": \"blue\"\n},\n\"noon\": {\n\"color\": \"blue-gray\"\n},\n\"afternoon\": {\n\"color\": \"warm gray\"\n},\n\"evening\": {\n\"color\": \"orange\"\n}\n}\n",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 4648158584,  
  "load_duration": 4071084,  
  "prompt_eval_count": 36,  
  "prompt_eval_duration": 439038000,  
  "eval_count": 180,  
  "eval_duration": 4196918000  
}  
```  
` response ` 包含的“ JSON ”字符串示例：  
```json  
{  
  "morning": {  
    "color": "blue"  
  },  
  "noon": {  
    "color": "blue-gray"  
  },  
  "afternoon": {  
    "color": "warm gray"  
  },  
  "evening": {  
    "color": "orange"  
  }  
}

带图像的生成

为支持多模态的模型（如 llava 或 bakllava）提供图像，需提交 base64 编码的图像数组：
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llava",  
  "prompt": "What is in this picture?",  
  "stream": false,  
  "images": ["MY_IMAGE"]  
}'  
```  
**响应**  
```json  
{  
  "model": "llava",  
  "created_at": "2023-11-03T15:36:02.583064Z",  
  "response": "A happy cartoon character, which is cute and cheerful.",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 2938432250,  
  "load_duration": 2559292,  
  "prompt_eval_count": 1,  
  "prompt_eval_duration": 2195557000,  
  "eval_count": 44,  
  "eval_duration": 736432000  
}

原始模式生成

若需直接提供完整提示并跳过模板处理，可启用 raw 参数。此模式不返回上下文。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "mistral",  
  "prompt": "[INST] why is the sky blue? [/INST]",  
  "raw": true,  
  "stream": false  
}'

确保输出一致

通过设置 seed 参数为固定值，可生成一致的输出：
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "mistral",  
  "prompt": "Why is the sky blue?",  
  "options": {  
    "seed": 123  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "mistral",  
  "created_at": "2023-11-03T15:36:02.583064Z",  
  "response": " The sky appears blue because of a phenomenon called Rayleigh scattering.",  
  "done": true,  
  "total_duration": 8493852375,  
  "load_duration": 6589624375,  
  "prompt_eval_count": 14,  
  "prompt_eval_duration": 119039000,  
  "eval_count": 110,  
  "eval_duration": 1779061000  
}

自定义选项生成

通过 options 参数，可在运行时动态调整模型设置，而无需修改 Modelfile。以下示例展示了所有可用选项，实际使用时可选择性设置。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "prompt": "Why is the sky blue?",  
  "stream": false,  
  "options": {  
    "num_keep": 5,  
    "seed": 42,  
    "num_predict": 100,  
    "top_k": 20,  
    "top_p": 0.9,  
    "min_p": 0.0,  
    "typical_p": 0.7,  
    "repeat_last_n": 33,  
    "temperature": 0.8,  
    "repeat_penalty": 1.2,  
    "presence_penalty": 1.5,  
    "frequency_penalty": 1.0,  
    "mirostat": 1,  
    "mirostat_tau": 0.8,  
    "mirostat_eta": 0.6,  
    "penalize_newline": true,  
    "stop": ["\n", "user:"],  
    "numa": false,  
    "num_ctx": 1024,  
    "num_batch": 2,  
    "num_gpu": 1,  
    "main_gpu": 0,  
    "low_vram": false,  
    "vocab_only": false,  
    "use_mmap": true,  
    "use_mlock": false,  
    "num_thread": 8  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "response": "The sky is blue because it is the color of the sky.",  
  "done": true,  
  "context": [1, 2, 3],  
  "total_duration": 4935886791,  
  "load_duration": 534986708,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 107345000,  
  "eval_count": 237,  
  "eval_duration": 4289432000  
}

加载模型

若未提供提示，模型将加载到内存中。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2"  
}'  
```  
**响应**  
返回单一“ JSON ”对象：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-12-18T19:52:07.071755Z",  
  "response": "",  
  "done": true  
}

卸载模型

若未提供提示且设置 keep_alive 为 0，模型将从内存中移除。
请求

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2",  
  "keep_alive": 0  
}'  
```  
**响应**  
返回单一“ JSON ”对象：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-09-12T03:54:03.516566Z",  
  "response": "",  
  "done": true,  
  "done_reason": "unload"  
}

生成对话回应

model：（必填）模型名称
messages：对话历史记录，用于保持上下文连贯
tools：支持的“ JSON ”工具列表，供模型调用（如适用）
消息结构
role：消息角色，包括 system、user、assistant 或 tool
content：消息文本内容
images：（可选）嵌入消息的图像，适用于如 llava 的多模态模型
tool_calls：（可选）模型希望调用的“ JSON ”工具
高级选项（可选）
format：输出格式，支持“ JSON ”或“ JSON 模式”。
options：模型运行参数，如 temperature。
stream：设为 false 则返回完整响应。
keep_alive：模型内存保留时间，默认5分钟。

结构化输出

通过指定“ JSON 模式”作为 format 参数，可生成符合预定义结构的响应。详见下文对话请求（结构化输出）示例。

示例

流式对话生成

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "why is the sky blue?"  
    }  
  ]  
}'  
```  
**响应**  
接口返回一连串“ JSON ”对象，逐步构建对话回应：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T08:52:19.385406455-07:00",  
  "message": {  
    "role": "assistant",  
    "content": "The",  
    "images": null  
  },  
  "done": false  
}  
```  
最终响应总结生成过程：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "message": {  
    "role": "assistant",  
    "content": ""  
  },  
  "done": true,  
  "total_duration": 4883583458,  
  "load_duration": 1334875,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 342546000,  
  "eval_count": 282,  
  "eval_duration": 4535599000  
}

非流式对话生成

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "why is the sky blue?"  
    }  
  ],  
  "stream": false  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-12-12T14:13:43.416799Z",  
  "message": {  
    "role": "assistant",  
    "content": "Hello! How are you today?"  
  },  
  "done": true,  
  "total_duration": 5191566416,  
  "load_duration": 2154458,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 383809000,  
  "eval_count": 298,  
  "eval_duration": 4799921000  
}

结构化输出对话

请求

curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{  
  "model": "llama3.1",  
  "messages": [{"role": "user", "content": "Ollama is 22 years old and busy saving the world. Return a JSON object with the age and availability."}],  
  "stream": false,  
  "format": {  
    "type": "object",  
    "properties": {  
      "age": {  
        "type": "integer"  
      },  
      "available": {  
        "type": "boolean"  
      }  
    },  
    "required": [  
      "age",  
      "available"  
    ]  
  },  
  "options": {  
    "temperature": 0  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.1",  
  "created_at": "2024-12-06T00:46:58.265747Z",  
  "message": { "role": "assistant", "content": "{\"age\": 22, \"available\": false}" },  
  "done_reason": "stop",  
  "done": true,  
  "total_duration": 2254970291,  
  "load_duration": 574751416,  
  "prompt_eval_count": 34,  
  "prompt_eval_duration": 1502000000,  
  "eval_count": 12,  
  "eval_duration": 175000000  
}

带历史记录的对话

通过包含对话历史，可实现多轮对话或链式推理。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "why is the sky blue?"  
    },  
    {  
      "role": "assistant",  
      "content": "due to rayleigh scattering."  
    },  
    {  
      "role": "user",  
      "content": "how is that different than mie scattering?"  
    }  
  ]  
}'  
```  
**响应**  
返回一系列“ JSON ”对象：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T08:52:19.385406455-07:00",  
  "message": {  
    "role": "assistant",  
    "content": "The"  
  },  
  "done": false  
}  
```  
最终响应：  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-08-04T19:22:45.499127Z",  
  "done": true,  
  "total_duration": 8113331500,  
  "load_duration": 6396458,  
  "prompt_eval_count": 61,  
  "prompt_eval_duration": 398801000,  
  "eval_count": 468,  
  "eval_duration": 7701267000  
}

带图像的对话

支持包含图像的对话，图像需以 base64 编码格式提供。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llava",  
  "messages": [  
    {  
      "role": "user",  
      "content": "what is in this image?",  
      "images": ["MY_IMAGE"]  
    }  
  ]  
}'  
```  
**响应**  
```json  
{  
  "model": "llava",  
  "created_at": "2023-12-13T22:42:50.203334Z",  
  "message": {  
    "role": "assistant",  
    "content": " The image features a cute, little pig with an angry facial expression. It's wearing a heart on its shirt and is waving in the air. This scene appears to be part of a drawing or sketching project.",  
    "images": null  
  },  
  "done": true,  
  "total_duration": 1668506709,  
  "load_duration": 1986209,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 359682000,  
  " chiến_count": 83,  
  "eval_duration": 1303285000  
}

确保对话输出一致

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "Hello!"  
    }  
  ],  
  "options": {  
    "seed": 101,  
    "temperature": 0  
  }  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2023-12-12T14:13:43.416799Z",  
  "message": {  
    "role": "assistant",  
    "content": "Hello! How are you today?"  
  },  
  "done": true,  
  "total_duration": 5191566416,  
  "load_duration": 2154458,  
  "prompt_eval_count": 26,  
  "prompt_eval_duration": 383809000,  
  "eval_count": 298,  
  "eval_duration": 4799921000  
}

使用工具的对话

请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [  
    {  
      "role": "user",  
      "content": "What is the weather today in Paris?"  
    }  
  ],  
  "stream": false,  
  "tools": [  
    {  
      "type": "function",  
      "function": {  
        "name": "get_current_weather",  
        "description": "Get the current weather for a location",  
        "parameters": {  
          "type": "object",  
          "properties": {  
            "location": {  
              "type": "string",  
              "description": "The location to get the weather for, e.g. San Francisco, CA"  
            },  
            "format": {  
              "type": "string",  
              "description": "The format to return the weather in, e.g. 'celsius' or 'fahrenheit'",  
              "enum": ["celsius", "fahrenheit"]  
            }  
          },  
          "required": ["location", "format"]  
        }  
      }  
    }  
  ]  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-07-22T20:33:28.123648Z",  
  "message": {  
    "role": "assistant",  
    "content": "",  
    "tool_calls": [  
      {  
        "function": {  
          "name": "get_current_weather",  
          "arguments": {  
            "format": "celsius",  
            "location": "Paris, FR"  
          }  
        }  
      }  
    ]  
  },  
  "done_reason": "stop",  
  "done": true,  
  "total_duration": 885095291,  
  "load_duration": 3753500,  
  "prompt_eval_count": 122,  
  "prompt_eval_duration": 328493000,  
  "eval_count": 33,  
  "eval_duration": 552222000  
}

加载对话模型

若未提供对话消息，模型将加载到内存中。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": []  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-09-12T21:17:29.110811Z",  
  "message": {  
    "role": "assistant",  
    "content": ""  
  },  
  "done_reason": "load",  
  "done": true  
}

卸载对话模型

若未提供消息且设置 keep_alive 为 0，模型将从内存中移除。
请求

curl http://localhost:11434/api/chat -d '{  
  "model": "llama3.2",  
  "messages": [],  
  "keep_alive": 0  
}'  
```  
**响应**  
```json  
{  
  "model": "llama3.2",  
  "created_at": "2024-09-12T21:33:17.547535Z",  
  "message": {  
    "role": "assistant",  
    "content": ""  
  },  
  "done_reason": "unload",  
  "done": true  
}

jetsung

音译第二部分

创建模型

端点：POST /api/create
从以下来源创建模型：

另一个现有模型；
“ Safetensors ”目录；
“ GGUF ”文件。
若从“ Safetensors ”目录或“ GGUF ”文件创建模型，需先为每个文件创建二进制大对象，然后在 files 字段中使用文件名和对应的 SHA256 摘要。

参数

model：要创建的模型名称
from：（可选）用于创建新模型的现有模型名称
files：（可选）文件名到二进制大对象 SHA256 摘要的字典，用于创建模型
adapters：（可选）“ LORA ”适配器的文件名到 SHA256 摘要的字典
template：（可选）模型的提示模板
license：（可选）模型的许可证，字符串或字符串列表
system：（可选）模型的系统提示
parameters：（可选）模型参数字典，参见 Modelfile
messages：（可选）用于创建对话的消息对象列表
stream：（可选）若为 false，响应以单一对象返回，而非对象流
quantize：（可选）对非量化模型（例如 float16）进行量化

量化类型

类型	推荐
q2_K
q3_K_L
q3_K_M
q3_K_S
q4_0
q4_1
q4_K_M	*
q4_K_S
q5_0
q5_1
q5_K_M
q5_K_S
q6_K
q8_0	*

示例

创建新模型

从现有模型创建新模型。
请求

curl http://localhost:11434/api/create -d '{  
  "model": "mario",  
  "from": "llama3.2",  
  "system": "You are Mario from Super Mario Bros."  
}'  
```  
**响应**  
返回一组“ JSON ”对象流：  
```json  
{"status":"reading model metadata"}  
{"status":"creating system layer"}  
{"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}  
{"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}  
{"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}  
{"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}  
{"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}  
{"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}  
{"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}  
{"status":"writing manifest"}  
{"status":"success"}

量化模型

对非量化模型进行量化。
请求

curl http://localhost:11434/api/create -d '{  
  "model": "llama3.1:quantized",  
  "from": "llama3.1:8b-instruct-fp16",  
  "quantize": "q4_K_M"  
}'  
```  
**响应**  
返回一组“ JSON ”对象流：  
```json  
{"status":"quantizing F16 model to Q4_K_M"}  
{"status":"creating new layer sha256:667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29"}  
{"status":"using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258"}  
{"status":"using existing layer sha256:0ba8f0e314b4264dfd19df045cde9d4c394a52474bf92ed6a3de22a4ca31a177"}  
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}  
{"status":"creating new layer sha256:455f34728c9b5dd3376378bfb809ee166c145b0b4c1f1a6feca069055066ef9a"}  
{"status":"writing manifest"}  
{"status":"success"}

从 GGUF 创建模型

从“ GGUF ”文件创建模型。files 参数需包含文件名和“ GGUF ”文件的 SHA256 摘要。在调用此 API 前，需通过 /api/blobs/:digest 将“ GGUF ”文件推送至服务器。
请求

curl http://localhost:11434/api/create -d '{  
  "model": "my-gguf-model",  
  "files": {  
    "test.gguf": "sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"  
  }  
}'  
```  
**响应**  
返回一组“ JSON ”对象流：  
```json  
{"status":"parsing GGUF"}  
{"status":"using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"}  
{"status":"writing manifest"}  
{"status":"success"}

从 Safetensors 目录创建模型

files 参数需包含“ Safetensors ”模型文件的文件名和 SHA256 摘要字典。在调用此 API 前，需通过 /api/blobs/:digest 将每个文件推送至服务器。文件将在 Ollama 服务器重启前保留在缓存中。
请求

curl http://localhost:11434/api/create -d '{  
  "model": "fred",  
  "files": {  
    "config.json": "sha256:dd3443e529fb2290423a0c65c2d633e67b419d273f170259e27297219828e389",  
    "generation_config.json": "sha256:88effbb63300dbbc7390143fbbdd9d9fa50587b37e8bfd16c8c90d4970a74a36",  
    "special_tokens_map.json": "sha256:b7455f0e8f00539108837bfa586c4fbf424e31f8717819a6798be74bef813d05",  
    "tokenizer.json": "sha256:bbc1904d35169c542dffbe1f7589a5994ec7426d9e5b609d07bab876f32e97ab",  
    "tokenizer_config.json": "sha256:24e8a6dc2547164b7002e3125f10b415105644fcf02bf9ad8b674c87b1eaaed6",  
    "model.safetensors": "sha256:1ff795ff6a07e6a68085d206fb84417da2f083f68391c2843cd2b8ac6df8538f"  
  }  
}'  
```  
**响应**  
返回一组“ JSON ”对象流：  
```json  
{"status":"converting model"}  
{"status":"creating new layer sha256:05ca5b813af4a53d2c2922933936e398958855c44ee534858fcfd830940618b6"}  
{"status":"using autodetected template llama3-instruct"}  
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}  
{"status":"writing manifest"}  
{"status":"success"}

检查二进制大对象是否存在

端点：HEAD /api/blobs/:digest
确保用于创建模型的二进制大对象（Binary Large Object）存在于服务器上。此检查针对 Ollama 服务器，而非 ollama.com。

查询参数

digest：二进制大对象的 SHA256 摘要

示例

请求

curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2  
```  
**响应**  
若二进制大对象存在，返回 200 OK；若不存在，返回 404 Not Found。

#### 推送二进制大对象
**端点**：`POST /api/blobs/:digest`  
将文件推送至 Ollama 服务器以创建二进制大对象（Binary Large Object）。  

**查询参数**  
- `digest`：文件的预期 SHA256 摘要  

##### 示例
**请求**  
```shell  
curl -T model.gguf -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2  
```  
**响应**  
若成功创建二进制大对象，返回 201 Created；若摘要不符合预期，返回 400 Bad Request。

#### 列出本地模型
**端点**：`GET /api/tags`  
列出本地可用的模型。  

##### 示例
**请求**  
```shell  
curl http://localhost:11434/api/tags  
```  
**响应**  
返回单一“ JSON ”对象：  
```json  
{  
  "models": [  
    {  
      "name": "codellama:13b",  
      "modified_at": "2023-11-04T14:56:49.277302595-07:00",  
      "size": 7365960935,  
      "digest": "9f438cb9cd581fc025612d27f7c1a6669ff83a8bb0ed86c94fcf4c5440555697",  
      "details": {  
        "format": "gguf",  
        "family": "llama",  
        "families": null,  
        "parameter_size": "13B",  
        "quantization_level": "Q4_0"  
      }  
    },  
    {  
      "name": "llama3:latest",  
      "modified_at": "2023-12-07T09:32:18.757212583-08:00",  
      "size": 3825819519,  
      "digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",  
      "details": {  
        "format": "gguf",  
        "family": "llama",  
        "families": null,  
        "parameter_size": "7B",  
        "quantization_level": "Q4_0"  
      }  
    }  
  ]  
}

显示模型信息

端点：POST /api/show
显示模型的详细信息，包括模型文件、模板、参数、许可证和系统提示。

参数

model：要显示的模型名称
verbose：（可选）若设为 true，返回详细的响应字段数据

示例

请求

curl http://localhost:11434/api/show -d '{  
  "model": "llava"  
}'  
```  
**响应**  
```json5  
{  
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"",  
  "parameters": "num_keep                       24\nstop                           \"<|start_header_id|>\"\nstop                           \"<|end_header_id|>\"\nstop                           \"<|eot_id|>\"",  
  "template": "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>",  
  "details": {  
    "parent_model": "",  
    "format": "gguf",  
    "family": "llama",  
    "families": [  
      "llama"  
    ],  
    "parameter_size": "8.0B",  
    "quantization_level": "Q4_0"  
  },  
  "model_info": {  
    "general.architecture": "llama",  
    "general.file_type": 2,  
    "general.parameter_count": 8030261248,  
    "general.quantization_version": 2,  
    "llama.attention.head_count": 32,  
    "llama.attention.head_count_kv": 8,  
    "llama.attention.layer_norm_rms_epsilon": 0.00001,  
    "llama.block_count": 32,  
    "llama.context_length": 8192,  
    "llama.embedding_length": 4096,  
    "llama.feed_forward_length": 14336,  
    "llama.rope.dimension_count": 128,  
    "llama.rope.freq_base": 500000,  
    "llama.vocab_size": 128256,  
    "tokenizer.ggml.bos_token_id": 128000,  
    "tokenizer.ggml.eos_token_id": 128009,  
    "tokenizer.ggml.merges": [],            // 若 verbose=true 则填充  
    "tokenizer.ggml.model": "gpt2",  
    "tokenizer.ggml.pre": "llama-bpe",  
    "tokenizer.ggml.token_type": [],        // 若 verbose=true 则填充  
    "tokenizer.ggml.tokens": []             // 若 verbose=true 则填充  
  },  
  "capabilities": [  
    "completion",  
    "vision"  
  ]  
}

复制模型

端点：POST /api/copy
复制模型，从现有模型创建具有新名称的模型。

示例

请求

curl http://localhost:11434/api/copy -d '{  
  "source": "llama3.2",  
  "destination": "llama3-backup"  
}'  
```  
**响应**  
若成功，返回 200 OK；若源模型不存在，返回 404 Not Found。

#### 删除模型
**端点**：`DELETE /api/delete`  
删除模型及其数据。  

**参数**  
- `model`：要删除的模型名称  

##### 示例
**请求**  
```shell  
curl -X DELETE http://localhost:11434/api/delete -d '{  
  "model": "llama3:13b"  
}'  
```  
**响应**  
若成功，返回 200 OK；若要删除的模型不存在，返回 404 Not Found。

#### 拉取模型
**端点**：`POST /api/pull`  
从 Ollama 模型库下载模型。已取消的拉取操作将从断点续传，多次调用将共享同一下载进度。  

**参数**  
- `model`：要拉取的模型名称  
- `insecure`：（可选）允许与模型库的不安全连接，仅用于开发中从自有模型库拉取时。  
- `stream`：（可选）若为 ` false `，响应以单一对象返回，而非对象流。  

##### 示例
**请求**  
```shell  
curl http://localhost:11434/api/pull -d '{  
  "model": "llama3.2"  
}'  
```  
**响应**  
若未指定 ` stream ` 或设为 ` true `，返回一组“ JSON ”对象流：  
首对象为清单：  
```json  
{  
  "status": "pulling manifest"  
}  
```  
随后为下载进度响应，在任一文件下载完成前，` completed ` 键可能不包含。下载文件数量取决于清单中指定的层数。  
```json  
{  
  "status": "downloading digestname",  
  "digest": "digestname",  
  "total": 2142590208,  
  "completed": 241970  
}  
```  
所有文件下载完成后，返回以下响应：  
```json  
{  
    "status": "verifying sha256 digest"  
}  
{  
    "status": "writing manifest"  
}  
{  
    "status": "removing any unused layers"  
}  
{  
    "status": "success"  
}  
```  
若 ` stream ` 设为 ` false `，返回单一“ JSON ”对象：  
```json  
{  
  "status": "success"  
}

推送模型

端点：POST /api/push
将模型上传至模型库。需先在 ollama.ai 注册并添加公钥。

参数

model：要推送的模型名称，格式为 <namespace>/<model>:<tag>
insecure：（可选）允许与模型库的不安全连接，仅用于开发中推送至自有模型库时。
stream：（可选）若为 false，响应以单一对象返回，而非对象流。

示例

请求

curl http://localhost:11434/api/push -d '{  
  "model": "mattw/pygmalion:latest"  
}'  
```  
**响应**  
若未指定 ` stream ` 或设为 ` true `，返回一组“ JSON ”对象流：  
```json  
{ "status": "retrieving manifest" }  
```  
随后为：  
```json  
{  
  "status": "starting upload",  
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",  
  "total": 1928429856  
}  
```  
接着为上传进度响应：  
```json  
{  
  "status": "starting upload",  
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",  
  "total": 1928429856  
}  
```  
上传完成后：  
```json  
{"status":"pushing manifest"}  
{"status":"success"}  
```  
若 ` stream ` 设为 ` false `，返回单一“ JSON ”对象：  
```json  
{ "status": "success" }

生成嵌入

端点：POST /api/embed
使用模型生成文本嵌入。

参数

model：生成嵌入的模型名称
input：生成嵌入的文本或文本列表

高级参数

truncate：若输入过长，截断末尾以适应上下文长度。若为 false 且上下文长度超限，返回错误。默认 true。
options：模型参数，如 temperature，详见 Modelfile。
keep_alive：控制模型在请求后保留在内存中的时间（默认：5分钟）。

示例

请求

curl http://localhost:11434/api/embed -d '{  
  "model": "all-minilm",  
  "input": "Why is the sky blue?"  
}'  
```  
**响应**  
```json  
{  
  "model": "all-minilm",  
  "embeddings": [[  
    0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,  
    0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348  
  ]],  
  "total_duration": 14143917,  
  "load_duration": 1019500,  
  "prompt_eval_count": 8  
}

多输入请求

请求

curl http://localhost:11434/api/embed -d '{  
  "model": "all-minilm",  
  "input": ["Why is the sky blue?", "Why is the grass green?"]  
}'  
```  
**响应**  
```json  
{  
  "model": "all-minilm",  
  "embeddings": [[  
    0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,  
    0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348  
  ],[  
    -0.0098027075, 0.06042469, 0.025257962, -0.006364387, 0.07272725,  
    0.017194884, 0.09032035, -0.051705178, 0.09951512, 0.09072481  
  ]]  
}

列出运行中的模型

端点：GET /api/ps
列出当前加载到内存中的模型。

示例

请求

curl http://localhost:11434/api/ps  
```  
**响应**  
返回单一“ JSON ”对象：  
```json  
{  
  "models": [  
    {  
      "name": "mistral:latest",  
      "model": "mistral:latest",  
      "size": 5137025024,  
      "digest": "2ae6f6dd7a3dd734790bbbf58b8909a606e0e7e97e94b7604e0aa7ae4490e6d8",  
      "details": {  
        "parent_model": "",  
        "format": "gguf",  
        "family": "llama",  
        "families": [  
          "llama"  
        ],  
        "parameter_size": "7.2B",  
        "quantization_level": "Q4_0"  
      },  
      "expires_at": "2024-06-04T14:38:31.83753-07:00",  
      "size_vram": 5137025024  
    }  
  ]  
}

生成嵌入（旧）

注意：此端点已被 /api/embed 取代

端点：POST /api/embeddings
使用模型生成文本嵌入。

参数

model：生成嵌入的模型名称
prompt：生成嵌入的文本

高级参数

options：模型参数，如 temperature，详见 Modelfile。
keep_alive：控制模型在请求后保留在内存中的时间（默认：5分钟）。

示例

请求

curl http://localhost:11434/api/embeddings -d '{  
  "model": "all-minilm",  
  "prompt": "Here is an article about llamas..."  
}'  
```  
**响应**  
```json  
{  
  "embedding": [  
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,  
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281  
  ]  
}

版本

端点：GET /api/version
检索 Ollama 的版本信息。

示例

请求

curl http://localhost:11434/api/version  
```  
**响应**  
```json  
{  
  "version": "0.5.1"  
}

jetsung

第二次翻译：意译

创建新模型

接口：POST /api/create
支持从以下来源创建模型：

现有模型；
“ Safetensors ”格式的目录；
“ GGUF ”格式的文件。
若使用“ Safetensors ”或“ GGUF ”文件，需先通过创建二进制大对象为每个文件生成 SHA256 摘要，并在 files 字段中指定文件名和摘要。

参数说明

model：新模型的名称
from：（可选）用作模板的现有模型名称
files：（可选）文件名与二进制大对象 SHA256 摘要的映射，用于模型创建
adapters：（可选）“ LORA ”适配器的文件名与 SHA256 摘要映射
template：（可选）模型的提示模板
license：（可选）模型的许可证，接受单个字符串或字符串列表
system：（可选）模型的系统提示
parameters：（可选）模型参数，详见 Modelfile
messages：（可选）用于构建对话的消息列表
stream：（可选）设为 false 时，返回完整响应而非流式输出
quantize：（可选）对非量化模型（如 float16）进行量化处理

支持的量化类型

类型	推荐
q2_K
q3_K_L
q3_K_M
q3_K_S
q4_0
q4_1
q4_K_M	*
q4_K_S
q5_0
q5_1
q5_K_M
q5_K_S
q6_K
q8_0	*

示例

创建新模型

基于现有模型生成新模型。
请求

curl http://localhost:11434/api/create -d '{  
  "model": "mario",  
  "from": "llama3.2",  
  "system": "You are Mario from Super Mario Bros."  
}'  
```  
**响应**  
返回一系列“ JSON ”对象，逐步展示创建过程：  
```json  
{"status":"reading model metadata"}  
{"status":"creating system layer"}  
{"status":"using already created layer sha256:22f7f8ef5f4c791c1b03d7eb414399294764d7cc82c7e94aa81a1feb80a983a2"}  
{"status":"using already created layer sha256:8c17c2ebb0ea011be9981cc3922db8ca8fa61e828c5d3f44cb6ae342bf80460b"}  
{"status":"using already created layer sha256:7c23fb36d80141c4ab8cdbb61ee4790102ebd2bf7aeff414453177d4f2110e5d"}  
{"status":"using already created layer sha256:2e0493f67d0c8c9c68a8aeacdf6a38a2151cb3c4c1d42accf296e19810527988"}  
{"status":"using already created layer sha256:2759286baa875dc22de5394b4a925701b1896a7e3f8e53275c36f75a877a82c9"}  
{"status":"writing layer sha256:df30045fe90f0d750db82a058109cecd6d4de9c90a3d75b19c09e5f64580bb42"}  
{"status":"writing layer sha256:f18a68eb09bf925bb1b669490407c1b1251c5db98dc4d3d81f3088498ea55690"}  
{"status":"writing manifest"}  
{"status":"success"}

模型量化

将非量化模型转换为量化格式。
请求

curl http://localhost:11434/api/create -d '{  
  "model": "llama3.1:quantized",  
  "from": "llama3.1:8b-instruct-fp16",  
  "quantize": "q4_K_M"  
}'  
```  
**响应**  
返回一系列“ JSON ”对象，展示量化过程：  
```json  
{"status":"quantizing F16 model to Q4_K_M"}  
{"status":"creating new layer sha256:667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29"}  
{"status":"using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258"}  
{"status":"using existing layer sha256:0ba8f0e314b4264dfd19df045cde9d4c394a52474bf92ed6a3de22a4ca31a177"}  
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}  
{"status":"creating new layer sha256:455f34728c9b5dd3376378bfb809ee166c145b0b4c1f1a6feca069055066ef9a"}  
{"status":"writing manifest"}  
{"status":"success"}

从 GGUF 文件创建

通过“ GGUF ”文件创建模型，需指定文件名和文件的 SHA256 摘要，并在调用前通过 /api/blobs/:digest 上传文件。
请求

curl http://localhost:11434/api/create -d '{  
  "model": "my-gguf-model",  
  "files": {  
    "test.gguf": "sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"  
  }  
}'  
```  
**响应**  
返回一系列“ JSON ”对象：  
```json  
{"status":"parsing GGUF"}  
{"status":"using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"}  
{"status":"writing manifest"}  
{"status":"success"}

从 Safetensors 目录创建

需提供“ Safetensors ”模型文件的文件名和 SHA256 摘要映射，并在调用前通过 /api/blobs/:digest 上传文件。文件在服务器重启前保留在缓存中。
请求

curl http://localhost:11434/api/create -d '{  
  "model": "fred",  
  "files": {  
    "config.json": "sha256:dd3443e529fb2290423a0c65c2d633e67b419d273f170259e27297219828e389",  
    "generation_config.json": "sha256:88effbb63300dbbc7390143fbbdd9d9fa50587b37e8bfd16c8c90d4970a74a36",  
    "special_tokens_map.json": "sha256:b7455f0e8f00539108837bfa586c4fbf424e31f8717819a6798be74bef813d05",  
    "tokenizer.json": "sha256:bbc1904d35169c542dffbe1f7589a5994ec7426d9e5b609d07bab876f32e97ab",  
    "tokenizer_config.json": "sha256:24e8a6dc2547164b7002e3125f10b415105644fcf02bf9ad8b674c87b1eaaed6",  
    "model.safetensors": "sha256:1ff795ff6a07e6a68085d206fb84417da2f083f68391c2843cd2b8ac6df8538f"  
  }  
}'  
```  
**响应**  
返回一系列“ JSON ”对象：  
```json  
{"status":"converting model"}  
{"status":"creating new layer sha256:05ca5b813af4a53d2c2922933936e398958855c44ee534858fcfd830940618b6"}  
{"status":"using autodetected template llama3-instruct"}  
{"status":"using existing layer sha256:56bb8bd477a519ffa694fc449c2413c6f0e1d3b1c88fa7e3c9d88d3ae49d4dcb"}  
{"status":"writing manifest"}  
{"status":"success"}

验证二进制大对象

接口：HEAD /api/blobs/:digest
检查用于模型创建的二进制大对象是否存在于服务器上，仅针对本地 Ollama 服务器。

参数

digest：二进制大对象的 SHA256 摘要

示例

请求

curl -I http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2  
```  
**响应**  
存在返回 200 OK，不存在返回 404 Not Found。

#### 上传二进制大对象
**接口**：`POST /api/blobs/:digest`  
将文件上传至 Ollama 服务器，生成二进制大对象。  

**参数**  
- `digest`：文件的预期 SHA256 摘要  

##### 示例
**请求**  
```shell  
curl -T model.gguf -X POST http://localhost:11434/api/blobs/sha256:29fdb92e57cf0827ded04ae6461b5931d01fa595843f55d36f5b275a52087dd2  
```  
**响应**  
成功上传返回 201 Created，摘要不匹配返回 400 Bad Request。

#### 查看本地模型
**接口**：`GET /api/tags`  
列出本地存储的所有模型。  

##### 示例
**请求**  
```shell  
curl http://localhost:11434/api/tags  
```  
**响应**  
返回单一“ JSON ”对象，包含模型列表：  
```json  
{  
  "models": [  
    {  
      "name": "codellama:13b",  
      "modified_at": "2023-11-04T14:56:49.277302595-07:00",  
      "size": 7365960935,  
      "digest": "9f438cb9cd581fc025612d27f7c1a6669ff83a8bb0ed86c94fcf4c5440555697",  
      "details": {  
        "format": "gguf",  
        "family": "llama",  
        "families": null,  
        "parameter_size": "13B",  
        "quantization_level": "Q4_0"  
      }  
    },  
    {  
      "name": "llama3:latest",  
      "modified_at": "2023-12-07T09:32:18.757212583-08:00",  
      "size": 3825819519,  
      "digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",  
      "details": {  
        "format": "gguf",  
        "family": "llama",  
        "families": null,  
        "parameter_size": "7B",  
        "quantization_level": "Q4_0"  
      }  
    }  
  ]  
}

获取模型详情

接口：POST /api/show
提供模型的详细信息，包括模型文件、模板、参数、许可证和系统提示。

参数

model：目标模型名称
verbose：（可选）设为 true 时返回更详细的数据

示例

请求

curl http://localhost:11434/api/show -d '{  
  "model": "llava"  
}'  
```  
**响应**  
```json5  
{  
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"",  
  "parameters": "num_keep                       24\nstop                           \"<|start_header_id|>\"\nstop                           \"<|end_header_id|>\"\nstop                           \"<|eot_id|>\"",  
  "template": "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>",  
  "details": {  
    "parent_model": "",  
    "format": "gguf",  
    "family": "llama",  
    "families": [  
      "llama"  
    ],  
    "parameter_size": "8.0B",  
    "quantization_level": "Q4_0"  
  },  
  "model_info": {  
    "general.architecture": "llama",  
    "general.file_type": 2,  
    "general.parameter_count": 8030261248,  
    "general.quantization_version": 2,  
    "llama.attention.head_count": 32,  
    "llama.attention.head_count_kv": 8,  
    "llama.attention.layer_norm_rms_epsilon": 0.00001,  
    "llama.block_count": 32,  
    "llama.context_length": 8192,  
    "llama.embedding_length": 4096,  
    "llama.feed_forward_length": 14336,  
    "llama.rope.dimension_count": 128,  
    "llama.rope.freq_base": 500000,  
    "llama.vocab_size": 128256,  
    "tokenizer.ggml.bos_token_id": 128000,  
    "tokenizer.ggml.eos_token_id": 128009,  
    "tokenizer.ggml.merges": [],            // 若 verbose=true 则填充  
    "tokenizer.ggml.model": "gpt2",  
    "tokenizer.ggml.pre": "llama-bpe",  
    "tokenizer.ggml.token_type": [],        // 若 verbose=true 则填充  
    "tokenizer.ggml.tokens": []             // 若 verbose=true 则填充  
  },  
  "capabilities": [  
    "completion",  
    "vision"  
  ]  
}

复制模型

接口：POST /api/copy
从现有模型创建同名副本。

示例

请求

curl http://localhost:11434/api/copy -d '{  
  "source": "llama3.2",  
  "destination": "llama3-backup"  
}'  
```  
**响应**  
成功返回 200 OK，源模型不存在返回 404 Not Found。

#### 删除模型
**接口**：`DELETE /api/delete`  
删除指定模型及其相关数据。  

**参数**  
- `model`：目标模型名称  

##### 示例
**请求**  
```shell  
curl -X DELETE http://localhost:11434/api/delete -d '{  
  "model": "llama3:13b"  
}'  
```  
**响应**  
成功返回 200 OK，模型不存在返回 404 Not Found。

#### 下载模型
**接口**：`POST /api/pull`  
从 Ollama 模型库下载模型，支持断点续传，多次调用共享下载进度。  

**参数**  
- `model`：目标模型名称  
- `insecure`：（可选）允许不安全的连接，仅用于开发环境的自有模型库  
- `stream`：（可选）设为 ` false ` 时返回单一响应  

##### 示例
**请求**  
```shell  
curl http://localhost:11434/api/pull -d '{  
  "model": "llama3.2"  
}'  
```  
**响应**  
默认或 ` stream ` 为 ` true ` 时，返回一系列“ JSON ”对象：  
初始清单：  
```json  
{  
  "status": "pulling manifest"  
}  
```  
下载进度，文件未完成下载时可能不含 ` completed ` 键：  
```json  
{  
  "status": "downloading digestname",  
  "digest": "digestname",  
  "total": 2142590208,  
  "completed": 241970  
}  
```  
下载完成后：  
```json  
{  
    "status": "verifying sha256 digest"  
}  
{  
    "status": "writing manifest"  
}  
{  
    "status": "removing any unused layers"  
}  
{  
    "status": "success"  
}  
```  
若 ` stream ` 为 ` false `，返回单一“ JSON ”对象：  
```json  
{  
  "status": "success"  
}

上传模型

接口：POST /api/push
将模型上传至模型库，需先在 ollama.ai 注册并配置公钥。

参数

model：模型名称，格式为 <namespace>/<model>:<tag>
insecure：（可选）允许不安全的连接，仅用于开发环境
stream：（可选）设为 false 时返回单一响应

示例

请求

curl http://localhost:11434/api/push -d '{  
  "model": "mattw/pygmalion:latest"  
}'  
```  
**响应**  
默认或 ` stream ` 为 ` true ` 时，返回一系列“ JSON ”对象：  
```json  
{ "status": "retrieving manifest" }  
```  
上传开始：  
```json  
{  
  "status": "starting upload",  
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",  
  "total": 1928429856  
}  
```  
上传进度：  
```json  
{  
  "status": "starting upload",  
  "digest": "sha256:bc07c81de745696fdf5afca05e065818a8149fb0c77266fb584d9b2cba3711ab",  
  "total": 1928429856  
}  
```  
上传完成：  
```json  
{"status":"pushing manifest"}  
{"status":"success"}  
```  
若 ` stream ` 为 ` false `，返回单一“ JSON ”对象：  
```json  
{ "status": "success" }

生成文本嵌入

接口：POST /api/embed
利用模型将文本转换为嵌入向量。

参数

model：用于生成嵌入的模型
input：单个文本或文本列表

高级选项

truncate：若文本过长，截断以适应上下文长度。设为 false 时超限报错，默认 true。
options：模型参数，如 temperature。
keep_alive：模型内存保留时间，默认5分钟。

示例

请求

curl http://localhost:11434/api/embed -d '{  
  "model": "all-minilm",  
  "input": "Why is the sky blue?"  
}'  
```  
**响应**  
```json  
{  
  "model": "all-minilm",  
  "embeddings": [[  
    0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,  
    0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348  
  ]],  
  "total_duration": 14143917,  
  "load_duration": 1019500,  
  "prompt_eval_count": 8  
}

批量文本嵌入

请求

curl http://localhost:11434/api/embed -d '{  
  "model": "all-minilm",  
  "input": ["Why is the sky blue?", "Why is the grass green?"]  
}'  
```  
**响应**  
```json  
{  
  "model": "all-minilm",  
  "embeddings": [[  
    0.010071029, -0.0017594862, 0.05007221, 0.04692972, 0.054916814,  
    0.008599704, 0.105441414, -0.025878139, 0.12958129, 0.031952348  
  ],[  
    -0.0098027075, 0.06042469, 0.025257962, -0.006364387, 0.07272725,  
    "status": "success"  
}

查看运行中模型

接口：GET /api/ps
列出当前驻留在内存中的模型。

示例

请求

curl http://localhost:11434/api/ps  
```  
**响应**  
返回单一“ JSON ”对象，包含模型信息：  
```json  
{  
  "models": [  
    {  
      "name": "mistral:latest",  
      "model": "mistral:latest",  
      "size": 5137025024,  
      "digest": "2ae6f6dd7a3dd734790bbbf58b8909a606e0e7e97e94b7604e0aa7ae4490e6d8",  
      "details": {  
        "parent_model": "",  
        "format": "gguf",  
        "family": "llama",  
        "families": [  
          "llama"  
        ],  
        "parameter_size": "7.2B",  
        "quantization_level": "Q4_0"  
      },  
      "expires_at": "2024-06-04T14:38:31.83753-07:00",  
      "size_vram": 5137025024  
    }  
  ]  
}

生成嵌入（旧接口）

提示：此接口已被 /api/embed 取代

接口：POST /api/embeddings
通过模型生成文本的嵌入向量。

参数

model：生成嵌入的模型
prompt：目标文本

高级选项

options：模型参数，如 temperature。
keep_alive：模型内存保留时间，默认5分钟。

示例

请求

curl http://localhost:11434/api/embeddings -d '{  
  "model": "all-minilm",  
  "prompt": "Here is an article about llamas..."  
}'  
```  
**响应**  
```json  
{  
  "embedding": [  
    0.5670403838157654, 0.009260174818336964, 0.23178744316101074, -0.2916173040866852, -0.8924556970596313,  
    0.8785552978515625, -0.34576427936553955, 0.5742510557174683, -0.04222835972905159, -0.137906014919281  
  ]  
}

查询版本

接口：GET /api/version
获取 Ollama 的版本信息。

示例

请求

curl http://localhost:11434/api/version  
```  
**响应**  
```json  
{  
  "version": "0.5.1"  
}