Adjust the number of concurrent requests according to the number of workers.
Build and Push CodeReview / build (push) Waiting to run Details

This commit is contained in:
vinland100 2026-01-08 18:40:23 +08:00
parent 044cd11ad4
commit 14b75beb9c
1 changed files with 3 additions and 1 deletions

View File

@ -639,7 +639,9 @@ class EmbeddingService:
)
# 🔥 控制并发请求数 (RPS 限制)
self._semaphore = asyncio.Semaphore(30)
# 全局 RPS 限制为 30由 4 个 gunicorn worker 共享
# 每个 worker 限制为 30/4 = 7 个并发请求,确保不触发限流
self._semaphore = asyncio.Semaphore(7)
# 🔥 设置默认批次大小 (对于 remote 模型,用户要求为 10)
is_remote = self.provider.lower() in ["openai", "qwen", "azure", "cohere", "jina", "huggingface"]