# 采用gpt-sovits方案,bert-sovits适合长音频训练,gpt-sovits运行短音频快速推理 ## 部署tts推理 git clone https://github.com/X-T-E-R/GPT-SoVITS-Inference.git ## 1. 安装依赖库 ``` conda create -n GPTSoVits python=3.9 conda activate GPTSoVits bash install.sh ``` 从 [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) 下载预训练模型,并将它们放置在 `GPT_SoVITS\pretrained_models` 中 ## 2. Model Folder Format 模型文件下载地址 https://www.yuque.com/xter/zibxlp/gsximn7ditzgispg 下载的模型文件放到trained目录下, 如 `trained/Character1/` Put the pth / ckpt / wav files in it, the wav should be named as the prompt text Like : ``` trained --hutao ----hutao-e75.ckpt ----hutao_e60_s3360.pth ----hutao said something.wav ``` ## 3. 启动 ### 3.1 后端服务: python Inference/src/tts_backend.py 如果有错误提示找不到cmudict,从这下载https://github.com/nltk/nltk_data,将packages改名为nltk_data放到home目录下 ### 3.2 管理character: python Inference/src/Character_Manager.py 浏览器打开可以管理character和emotion ### 3.3 测试tts功能: python Inference/src/TTS_Webui.py ## 4. 接口说明 ### 4.1 Character and Emotion List To obtain the supported characters and their corresponding emotions, please visit the following URL: - URL: `http://127.0.0.1:5000/character_list` - Returns: A JSON format list of characters and corresponding emotions - Method: `GET` ``` { "Hanabi": [ "default", "Normal", "Yandere", ], "Hutao": [ "default" ] } ``` ### 4.2 Text-to-Speech - URL: `http://127.0.0.1:5000/tts` - Returns: Audio on success. Error message on failure. - Method: `GET`/`POST` ``` { "method": "POST", "body": { "character": "${chaName}", "emotion": "${Emotion}", "text": "${speakText}", "text_language": "${textLanguage}", "batch_size": ${batch_size}, "speed": ${speed}, "top_k": ${topK}, "top_p": ${topP}, "temperature": ${temperature}, "stream": "${stream}", "format": "${Format}", "save_temp": "${saveTemp}" } } ``` ##### Parameter Explanation - **text**: The text to be converted, URL encoding is recommended. - **character**: Character folder name, pay attention to case sensitivity, full/half width, and language. - **emotion**: Character emotion, must be an actually supported emotion of the character, otherwise, the default emotion will be used. - **text_language**: Text language (auto / zh / en / ja), default is multilingual mixed. - **top_k**, **top_p**, **temperature**: GPT model parameters, no need to modify if unfamiliar. - **batch_size**: How many batches at a time, can be increased for faster processing if you have a powerful computer, integer, default is 1. - **speed**: Speech speed, default is 1.0. - **save_temp**: Whether to save temporary files, when true, the backend will save the generated audio, and subsequent identical requests will directly return that data, default is false. - **stream**: Whether to stream, when true, audio will be returned sentence by sentence, default is false. - **format**: Format, default is WAV, allows MP3/ WAV/ OGG. ## 部署tts训练 https://github.com/RVC-Boss/GPT-SoVITS 根据文档说明部署,将训练后的模型拷到推理服务的trained目录下