livetalking/tts/README.md

4.1 KiB
Raw Blame History

采用gpt-sovits方案bert-sovits适合长音频训练gpt-sovits运行短音频快速推理

部署tts推理

git clone https://github.com/X-T-E-R/GPT-SoVITS-Inference.git

1. 安装依赖库

conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh

GPT-SoVITS Models 下载预训练模型,并将它们放置在 GPT_SoVITS\pretrained_models

注意

是将 GPT-SoVITS  的模型文件放入 pretrained_models目录中

如下

pretrained_models/
--chinese-hubert-base
--chinese-roberta-wwm-ext-large
s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
s2D488k.pth
s2G488k.pth

2. Model Folder Format

模型文件下载地址 https://www.yuque.com/xter/zibxlp/gsximn7ditzgispg
下载的模型文件放到trained目录下, 如 trained/Character1/
Put the pth / ckpt / wav files in it, the wav should be named as the prompt text
Like :

trained
--hutao
----hutao-e75.ckpt
----hutao_e60_s3360.pth
----hutao said something.wav

3. 启动

3.1 启动webui界面

python webuis/character_manager/webui.py 可以设置上传的模型数据

3.2 启动api服务:

python app.py

如果有错误提示找不到cmudict从这下载https://github.com/nltk/nltk_data将packages改名为nltk_data放到home目录下

3.3 tts测试

访问 http://127.0.0.1:5000 地址即可测试

3.4 api测试

访问 http://127.0.0.1:5000/character_list 查看是否正常

4. 接口说明

4.1 Character and Emotion List

To obtain the supported characters and their corresponding emotions, please visit the following URL:

  • URL: http://127.0.0.1:5000/character_list
  • Returns: A JSON format list of characters and corresponding emotions
  • Method: GET
{
    "Hanabi": [
        "default",
        "Normal",
        "Yandere",
    ],
    "Hutao": [
        "default"
    ]
}

4.2 Text-to-Speech

  • URL: http://127.0.0.1:5000/tts
  • Returns: Audio on success. Error message on failure.
  • Method: GET/POST
{
    "method": "POST",
    "body": {
        "character": "${chaName}",
        "emotion": "${Emotion}",
        "text": "${speakText}",
        "text_language": "${textLanguage}",
        "batch_size": ${batch_size},
        "speed": ${speed},
        "top_k": ${topK},
        "top_p": ${topP},
        "temperature": ${temperature},
        "stream": "${stream}",
        "format": "${Format}",
        "save_temp": "${saveTemp}"
    }
}
Parameter Explanation
  • text: The text to be converted, URL encoding is recommended.

  • character: Character folder name, pay attention to case sensitivity, full/half width, and language.

  • emotion: Character emotion, must be an actually supported emotion of the character, otherwise, the default emotion will be used.

  • text_language: Text language (auto / zh / en / ja), default is multilingual mixed.

  • top_k, top_p, temperature: GPT model parameters, no need to modify if unfamiliar.

  • batch_size: How many batches at a time, can be increased for faster processing if you have a powerful computer, integer, default is 1.

  • speed: Speech speed, default is 1.0.

  • save_temp: Whether to save temporary files, when true, the backend will save the generated audio, and subsequent identical requests will directly return that data, default is false.

  • stream: Whether to stream, when true, audio will be returned sentence by sentence, default is false.

  • format: Format, default is WAV, allows MP3/ WAV/ OGG.

部署tts训练

https://github.com/RVC-Boss/GPT-SoVITS
根据文档说明部署将训练后的模型拷到推理服务的trained目录下

如果你需要使用autodl 进行部署

请使用 https://www.codewithgpu.com/i/RVC-Boss/GPT-SoVITS/GPT-SoVITS 作为基础镜像你能快速进行部署

下载

https://github.com/X-T-E-R/GPT-SoVITS-Inference

安装

cd GPT-SoVITS-Inference 
pip3 install -r requirements.txt
cp -r GPT_SoVITS/pretrained_models/ ./GPT_SoVITS/pretrained_models

启动api

python3 app.py

启动webui

python3  webuis/character_manager/webui.py