> ## Documentation Index
> Fetch the complete documentation index at: https://veniceai-mintlify-6ce01df5.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# 速率限制

> Venice API 的请求和 token 速率限制。

速率限制因模型和等级而异。下面的默认限制是有用的参考，但 `/api_keys/rate_limits` API 端点是获取您当前限制的权威方式。您可以随时查看您的确切限制：

<CardGroup cols={2}>
  <Card title="查看您的限制" icon="gauge-high" href="/api-reference/endpoint/api_keys/rate_limits?playground=open">
    交互式 playground
  </Card>

  <Card title="速率限制日志" icon="clock-rotate-left" href="/api-reference/endpoint/api_keys/rate_limit_logs?playground=open">
    查看哪些请求达到了限制
  </Card>
</CardGroup>

```bash theme={"system"}
curl https://api.venice.ai/api/v1/api_keys/rate_limits \
  -H "Authorization: Bearer $VENICE_API_KEY"
```

## 默认限制

### 文本模型

文本模型根据规模分为多个等级。[模型页面](/models/text)上的每个模型卡片都会显示其等级徽章。

| 等级 | 请求/分钟 | Tokens/分钟 |
| :- | ----: | --------: |
| XS |   500 | 1,000,000 |
| S  |    75 |   750,000 |
| M  |    50 |   750,000 |
| L  |    20 |   500,000 |

<Accordion title="每个等级中包含哪些模型？">
  **XS** `qwen3-4b` `llama-3.2-3b`

  **S** `mistral-31-24b` `venice-uncensored`

  **M** `zai-org-glm-5` `qwen3-next-80b` `google-gemma-3-27b-it`

  **L** `qwen3-235b-a22b-instruct-2507` `qwen3-235b-a22b-thinking-2507` `deepseek-ai-DeepSeek-R1` `grok-41-fast` `kimi-k2-thinking` `gemini-3-pro-preview` `hermes-3-llama-3.1-405b` `qwen3-coder-480b-a35b-instruct` `zai-org-glm-4.7` `openai-gpt-oss-120b`
</Accordion>

### 其他模型

| 类型              | 请求/分钟 |
| :-------------- | ----: |
| Image           |    20 |
| Audio           |    60 |
| Embedding       |   500 |
| Video（queue）    |    40 |
| Video（retrieve） |   120 |

## 处理错误

失败的请求（500、503、429）应使用指数退避进行重试。

对于 429 错误，请查看 `x-ratelimit-reset-requests` 响应头以获取您可以重试的确切 Unix 时间戳。大多数 HTTP 库都内置了自动处理此情况的重试机制。

### 滥用保护

如果您在 30 秒内产生超过 20 次失败请求，API 将阻止后续请求 30 秒：

```
Too many failed attempts (> 20) resulting in a non-success status code. Please wait 30s and try again.
```

## 响应头

每个响应都包含以下请求头：

| 请求头                              | 说明               |
| :------------------------------- | :--------------- |
| `x-ratelimit-limit-requests`     | 当前窗口内允许的最大请求数    |
| `x-ratelimit-remaining-requests` | 当前窗口内剩余的请求数      |
| `x-ratelimit-reset-requests`     | 窗口重置时的 Unix 时间戳  |
| `x-ratelimit-limit-tokens`       | 每分钟允许的最大 token 数 |
| `x-ratelimit-remaining-tokens`   | 当前分钟内剩余的 token 数 |
| `x-ratelimit-reset-tokens`       | 距 token 限制重置的秒数  |

## 合作伙伴等级

合作伙伴可获得显著更高的速率限制：

| 等级 | 请求/分钟 | Tokens/分钟 |
| :- | ----: | --------: |
| XS |   500 | 2,000,000 |
| S  |   150 | 1,500,000 |
| M  |   100 | 1,500,000 |
| L  |    60 | 1,000,000 |

| 类型        | 请求/分钟 |
| :-------- | ----: |
| Image     |    60 |
| Audio     |   120 |
| Embedding |   500 |

如果您持续达到速率限制，并且您的使用模式显示出**长期的持续需求**，请联系我们讨论合作伙伴访问权限：[api@venice.ai](mailto:api@venice.ai)。

合作伙伴等级的限制可根据您的具体需求进行调整。