完善数字人播报

This commit is contained in:
lihengzhong 2023-12-28 13:11:18 +08:00
parent f0b46ddb8e
commit bdd4be3919
9 changed files with 159 additions and 202 deletions

140
README.md
View File

@ -1,121 +1,79 @@
# 虚拟人说话头生成(照片虚拟人实时驱动)
![](/img/example.gif)
# Get Started
A streaming digital human based on the Ernerf model realize audio video synchronous dialogue. It can basically achieve commercial effects.
基于ernerf模型的流式数字人实现音视频同步对话。基本可以达到商用效果
## Installation
Tested on Ubuntu 22.04, Pytorch 1.12 and CUDA 11.6or Pytorch 1.12 and CUDA 11.3
```python
git clone https://github.com/waityousea/xuniren.git
cd xuniren
```
Tested on Ubuntu 18.04, Pytorch 1.12 and CUDA 11.3.
### Install dependency
```python
# for ubuntu, portaudio is needed for pyaudio to work.
sudo apt install portaudio19-dev
```bash
pip install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
or
## environment.yml中的pytorch使用的1.12和cuda 11.3
conda env create -f environment.yml
## install pytorch3d
#ubuntu/mac
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install tensorflow-gpu==2.8.0
```
linux cuda环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/674972886
安装rtmpstream库
参照 https://github.com/lipku/python_rtmpstream
## Run
### 运行rtmpserver (srs)
```
docker run --rm -it -p 1935:1935 -p 1985:1985 -p 8080:8080 registry.cn-hangzhou.aliyuncs.com/ossrs/srs:5
```
**windows安装pytorch3d**
- gcc & g++ ≥ 4.9
在windows中需要安装gcc编译器可以根据需求自行安装例如采用MinGW
以下安装步骤来自于[pytorch3d](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md)官方, 可以根据需求进行选择。
```python
conda create -n pytorch3d python=3.9
conda activate pytorch3d
conda install pytorch=1.13.0 torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
```
对于 CUB 构建时间依赖项,仅当您的 CUDA 早于 11.7 时才需要,如果您使用的是 conda则可以继续
```
conda install -c bottler nvidiacub
```
```
# Demos and examples
conda install jupyter
pip install scikit-image matplotlib imageio plotly opencv-python
# Tests/Linting
pip install black usort flake8 flake8-bugbear flake8-comprehensions
```
任何必要的补丁后你可以去“x64 Native Tools Command Prompt for VS 2019”编译安装
```
git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d
python setup.py install
```
### Build extension
By default, we use [`load`](https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.load) to build the extension at runtime. However, this may be inconvenient sometimes. Therefore, we also provide the `setup.py` to build each extension:
```
# install all extension modules
# notice: 该模块必须安装。
# 在windows下建议采用vs2019的x64 Native Tools Command Prompt for VS 2019命令窗口安装
bash scripts/install_ext.sh
```
### **start(独立运行)**
环境配置完成后,启动虚拟人生成器:
### 环境配置完成后,启动:
```python
python app.py
```
### **start对接fay在ubuntu 20下完成测试**
环境配置完成后启动fay对接脚本
```python
python fay_connect.py
如果访问不了huggingface在运行前
```
export HF_ENDPOINT=https://hf-mirror.com
```
![](img/weplay.png)
扫码支助开源开发工作凭支付单号入qq交流群
运行成功后用vlc访问rtmp://serverip/live/livestream
### 网页端数字人播报输入文字
安装并启动nginx
```
apt install nginx
nginx
```
修改echo.html中websocket和视频播放地址将serverip替换成实际服务器ip
然后将echo.html和mpegts-1.7.3.min.js拷到/var/www/html下
启动数字人
```python
python app.py
```
接口的输入与输出信息 [Websoket.md](https://github.com/waityousea/xuniren/blob/main/WebSocket.md)
用浏览器打开http://serverip/echo.html在文本框输入任意文字提交。数字人播报该段文字
虚拟人生成的核心文件
## Data flow
![](/assets/dataflow.png)
## 数字人模型文件,可以替换成自己训练的模型(https://github.com/Fictionarry/ER-NeRF)
```python
## 注意,核心文件需要单独训练
.
├── data
│ ├── kf.json
│ ├── data_kf.json
│ ├── pretrained
│ └── └── ngp_kg.pth
```
### Inference Speed
## TODO
- 添加chatgpt实现数字人对话
- 声音克隆
- 数字人静音时用一段视频代替
在台式机RTX A4000或笔记本RTX 3080ti的显卡显存16G上进行视频推理时1s可以推理35~43帧假如1s视频25帧则1s可推理约1.5s视频。
# Acknowledgement
- The data pre-processing part is adapted from [AD-NeRF](https://github.com/YudongGuo/AD-NeRF).
- The NeRF framework is based on [torch-ngp](https://github.com/ashawkey/torch-ngp).
- The algorithm core come from [RAD-NeRF](https://github.com/ashawkey/RAD-NeRF).
- Usage example [Fay](https://github.com/TheRamU/Fay).
学术交流可发邮件到邮箱waityousea@126.com
如果本项目对你有帮助帮忙点个star。也欢迎感兴趣的朋友一起来完善该项目。
Email: lipku@foxmail.com

44
app.py
View File

@ -7,10 +7,11 @@ import json
import gevent
from gevent import pywsgi
from geventwebsocket.handler import WebSocketHandler
from tools import audio_pre_process, video_pre_process, generate_video,audio_process
import os
import re
import numpy as np
from threading import Thread
import multiprocessing
import argparse
from nerf_triplane.provider import NeRFDataset_Test
@ -24,7 +25,6 @@ import edge_tts
app = Flask(__name__)
sockets = Sockets(app)
video_list = []
global nerfreal
@ -40,33 +40,15 @@ async def main(voicename: str, text: str, render):
pass
def send_information(path, ws):
print('传输信息开始!')
#path = video_list[0]
''''''
with open(path, 'rb') as f:
video_data = base64.b64encode(f.read()).decode()
data = {
'video': 'data:video/mp4;base64,%s' % video_data,
}
json_data = json.dumps(data)
ws.send(json_data)
def txt_to_audio(text_):
audio_list = []
#audio_path = 'data/audio/aud_0.wav'
voicename = "zh-CN-YunxiaNeural"
# 让我们一起学习。必应由 AI 提供支持,因此可能出现意外和错误。请确保核对事实,并 共享反馈以便我们可以学习和改进!
text = text_
asyncio.get_event_loop().run_until_complete(main(voicename,text,nerfreal))
#audio_process(audio_path)
@sockets.route('/dighuman')
@sockets.route('/humanecho')
def echo_socket(ws):
# 获取WebSocket对象
#ws = request.environ.get('wsgi.websocket')
@ -81,19 +63,12 @@ def echo_socket(ws):
message = ws.receive()
if len(message)==0:
return '输入信息为空'
else:
txt_to_audio(message)
audio_path = 'data/audio/aud_0.wav'
audio_path_eo = 'data/audio/aud_0_eo.npy'
video_path = 'data/video/results/ngp_0.mp4'
output_path = 'data/video/results/output_0.mp4'
generate_video(audio_path, audio_path_eo, video_path, output_path)
video_list.append(output_path)
send_information(output_path, ws)
def render():
nerfreal.render()
if __name__ == '__main__':
@ -242,12 +217,13 @@ if __name__ == '__main__':
# we still need test_loader to provide audio features for testing.
nerfreal = NeRFReal(opt, trainer, test_loader)
txt_to_audio('我是中国人,我来自北京')
nerfreal.render()
#txt_to_audio('我是中国人,我来自北京')
rendthrd = Thread(target=render)
rendthrd.start()
#############################################################################
server = pywsgi.WSGIServer(('127.0.0.1', 8800), app, handler_class=WebSocketHandler)
print('start websocket server')
server = pywsgi.WSGIServer(('0.0.0.0', 8000), app, handler_class=WebSocketHandler)
server.serve_forever()

View File

@ -8,6 +8,7 @@ import pyaudio
import soundfile as sf
import resampy
import queue
from queue import Queue
#from collections import deque
from threading import Thread, Event
@ -318,9 +319,11 @@ class ASR:
return None
else:
frame = self.queue.get()
try:
frame = self.queue.get(block=False)
print(f'[INFO] get frame {frame.shape}')
except queue.Empty:
frame = np.zeros(self.chunk, dtype=np.float32)
self.idx = self.idx + self.chunk
@ -380,10 +383,9 @@ class ASR:
def push_audio(self,buffer):
print(f'[INFO] push_audio {len(buffer)}')
self.input_stream.write(buffer)
if len(buffer)<=0:
self.input_stream.seek(0)
stream = self.create_bytes_stream(self.input_stream)
if len(buffer)>0:
byte_stream=BytesIO(buffer)
stream = self.create_bytes_stream(byte_stream)
streamlen = stream.shape[0]
idx=0
while streamlen >= self.chunk:
@ -392,6 +394,18 @@ class ASR:
idx += self.chunk
if streamlen>0:
self.queue.put(stream[idx:])
# self.input_stream.write(buffer)
# if len(buffer)<=0:
# self.input_stream.seek(0)
# stream = self.create_bytes_stream(self.input_stream)
# streamlen = stream.shape[0]
# idx=0
# while streamlen >= self.chunk:
# self.queue.put(stream[idx:idx+self.chunk])
# streamlen -= self.chunk
# idx += self.chunk
# if streamlen>0:
# self.queue.put(stream[idx:])
def get_audio_out(self):
return self.output_queue.get()

BIN
assets/dataflow.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

62
echo.html Normal file
View File

@ -0,0 +1,62 @@
<!-- index.html -->
<html>
<head>
<script type="text/javascript" src="mpegts-1.7.3.min.js"></script>
<script type="text/javascript" src="http://cdn.sockjs.org/sockjs-0.3.4.js"></script>
<script src="http://code.jquery.com/jquery-2.1.1.min.js"></script>
</head>
<body>
<div class="container">
<h1>WebSocket Test</h1>
<form class="form-inline" id="echo-form">
<div class="form-group">
<p>input text</p>
<textarea cols="2" rows="3" style="width:600px;height:50px;" class="form-control" id="message">test</textarea>
</div>
<button type="submit" class="btn btn-default">Send</button>
</form>
<div id="log">
</div>
<video id="video_player" width="40%" autoplay controls></video>
</div>
</body>
<script type="text/javascript" charset="utf-8">
$(document).ready(function() {
var ws = new WebSocket('ws://serverip:8000/humanecho');
//document.getElementsByTagName("video")[0].setAttribute("src", aa["video"]);
ws.onopen = function() {
console.log('Connected');
};
ws.onmessage = function(e) {
console.log('Received: ' + e.data);
data = e
var vid = JSON.parse(data.data);
console.log(typeof(vid),vid)
//document.getElementsByTagName("video")[0].setAttribute("src", vid["video"]);
};
ws.onclose = function(e) {
console.log('Closed');
};
flvPlayer = mpegts.createPlayer({type: 'flv', url: "http://serverip:8080/live/livestream.flv", isLive: true, enableStashBuffer: false});
flvPlayer.attachMediaElement(document.getElementById('video_player'));
flvPlayer.load();
flvPlayer.play();
$('#echo-form').on('submit', function(e) {
e.preventDefault();
var message = $('#message').val();
console.log('Sending: ' + message);
ws.send(message);
$('#message').val('');
});
});
</script>
</html>

9
mpegts-1.7.3.min.js vendored Normal file

File diff suppressed because one or more lines are too long

View File

@ -144,13 +144,13 @@ class NeRFReal:
data['auds'] = self.asr.get_next_feat()
outputs = self.trainer.test_gui_with_data(data, self.W, self.H)
print(f'[INFO] outputs shape ',outputs['image'].shape)
#print(f'[INFO] outputs shape ',outputs['image'].shape)
image = (outputs['image'] * 255).astype(np.uint8)
self.streamer.stream_frame(image)
#self.pipe.stdin.write(image.tostring())
for _ in range(2):
frame = self.asr.get_audio_out()
print(f'[INFO] get_audio_out shape ',frame.shape)
#print(f'[INFO] get_audio_out shape ',frame.shape)
self.streamer.stream_frame_audio(frame)
# frame = (frame * 32767).astype(np.int16).tobytes()
# self.fifo_audio.write(frame)

View File

@ -12,6 +12,7 @@ rich
dearpygui
packaging
scipy
scikit-learn
face_alignment
python_speech_features
@ -24,3 +25,8 @@ configargparse
lpips
imageio-ffmpeg
transformers
edge_tts
flask
flask_sockets

File diff suppressed because one or more lines are too long