完善数字人播报
This commit is contained in:
parent
f0b46ddb8e
commit
bdd4be3919
140
README.md
140
README.md
|
@ -1,121 +1,79 @@
|
||||||
# 虚拟人说话头生成(照片虚拟人实时驱动)
|
A streaming digital human based on the Ernerf model, realize audio video synchronous dialogue. It can basically achieve commercial effects.
|
||||||
![](/img/example.gif)
|
基于ernerf模型的流式数字人,实现音视频同步对话。基本可以达到商用效果
|
||||||
# Get Started
|
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
Tested on Ubuntu 22.04, Pytorch 1.12 and CUDA 11.6,or Pytorch 1.12 and CUDA 11.3
|
Tested on Ubuntu 18.04, Pytorch 1.12 and CUDA 11.3.
|
||||||
|
|
||||||
```python
|
|
||||||
git clone https://github.com/waityousea/xuniren.git
|
|
||||||
cd xuniren
|
|
||||||
```
|
|
||||||
|
|
||||||
### Install dependency
|
### Install dependency
|
||||||
|
|
||||||
```python
|
```bash
|
||||||
# for ubuntu, portaudio is needed for pyaudio to work.
|
pip install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
|
||||||
sudo apt install portaudio19-dev
|
|
||||||
|
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
or
|
|
||||||
## environment.yml中的pytorch使用的1.12和cuda 11.3
|
|
||||||
conda env create -f environment.yml
|
|
||||||
## install pytorch3d
|
|
||||||
#ubuntu/mac
|
|
||||||
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
|
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
|
||||||
|
pip install tensorflow-gpu==2.8.0
|
||||||
|
```
|
||||||
|
linux cuda环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/674972886
|
||||||
|
|
||||||
|
安装rtmpstream库
|
||||||
|
参照 https://github.com/lipku/python_rtmpstream
|
||||||
|
|
||||||
|
|
||||||
|
## Run
|
||||||
|
|
||||||
|
### 运行rtmpserver (srs)
|
||||||
|
```
|
||||||
|
docker run --rm -it -p 1935:1935 -p 1985:1985 -p 8080:8080 registry.cn-hangzhou.aliyuncs.com/ossrs/srs:5
|
||||||
```
|
```
|
||||||
|
|
||||||
**windows安装pytorch3d**
|
### 环境配置完成后,启动:
|
||||||
|
|
||||||
- gcc & g++ ≥ 4.9
|
|
||||||
|
|
||||||
在windows中,需要安装gcc编译器,可以根据需求自行安装,例如采用MinGW
|
|
||||||
|
|
||||||
以下安装步骤来自于[pytorch3d](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md)官方, 可以根据需求进行选择。
|
|
||||||
|
|
||||||
```python
|
|
||||||
conda create -n pytorch3d python=3.9
|
|
||||||
conda activate pytorch3d
|
|
||||||
conda install pytorch=1.13.0 torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
|
|
||||||
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
|
|
||||||
```
|
|
||||||
|
|
||||||
对于 CUB 构建时间依赖项,仅当您的 CUDA 早于 11.7 时才需要,如果您使用的是 conda,则可以继续
|
|
||||||
|
|
||||||
```
|
|
||||||
conda install -c bottler nvidiacub
|
|
||||||
```
|
|
||||||
|
|
||||||
```
|
|
||||||
# Demos and examples
|
|
||||||
conda install jupyter
|
|
||||||
pip install scikit-image matplotlib imageio plotly opencv-python
|
|
||||||
|
|
||||||
# Tests/Linting
|
|
||||||
pip install black usort flake8 flake8-bugbear flake8-comprehensions
|
|
||||||
```
|
|
||||||
|
|
||||||
任何必要的补丁后,你可以去“x64 Native Tools Command Prompt for VS 2019”编译安装
|
|
||||||
|
|
||||||
```
|
|
||||||
git clone https://github.com/facebookresearch/pytorch3d.git
|
|
||||||
cd pytorch3d
|
|
||||||
python setup.py install
|
|
||||||
```
|
|
||||||
|
|
||||||
### Build extension
|
|
||||||
|
|
||||||
By default, we use [`load`](https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.load) to build the extension at runtime. However, this may be inconvenient sometimes. Therefore, we also provide the `setup.py` to build each extension:
|
|
||||||
|
|
||||||
```
|
|
||||||
# install all extension modules
|
|
||||||
# notice: 该模块必须安装。
|
|
||||||
# 在windows下,建议采用vs2019的x64 Native Tools Command Prompt for VS 2019命令窗口安装
|
|
||||||
bash scripts/install_ext.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
### **start(独立运行)**
|
|
||||||
|
|
||||||
环境配置完成后,启动虚拟人生成器:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
python app.py
|
python app.py
|
||||||
```
|
```
|
||||||
### **start(对接fay,在ubuntu 20下完成测试)**
|
|
||||||
环境配置完成后,启动fay对接脚本
|
如果访问不了huggingface,在运行前
|
||||||
```python
|
```
|
||||||
python fay_connect.py
|
export HF_ENDPOINT=https://hf-mirror.com
|
||||||
```
|
```
|
||||||
![](img/weplay.png)
|
|
||||||
|
|
||||||
扫码支助开源开发工作,凭支付单号入qq交流群
|
运行成功后,用vlc访问rtmp://serverip/live/livestream
|
||||||
|
|
||||||
|
### 网页端数字人播报输入文字
|
||||||
|
安装并启动nginx
|
||||||
|
```
|
||||||
|
apt install nginx
|
||||||
|
nginx
|
||||||
|
```
|
||||||
|
修改echo.html中websocket和视频播放地址,将serverip替换成实际服务器ip
|
||||||
|
然后将echo.html和mpegts-1.7.3.min.js拷到/var/www/html下
|
||||||
|
|
||||||
|
启动数字人
|
||||||
|
```python
|
||||||
|
python app.py
|
||||||
|
```
|
||||||
|
|
||||||
接口的输入与输出信息 [Websoket.md](https://github.com/waityousea/xuniren/blob/main/WebSocket.md)
|
用浏览器打开http://serverip/echo.html,在文本框输入任意文字,提交。数字人播报该段文字
|
||||||
|
|
||||||
虚拟人生成的核心文件
|
## Data flow
|
||||||
|
![](/assets/dataflow.png)
|
||||||
|
|
||||||
|
## 数字人模型文件,可以替换成自己训练的模型(https://github.com/Fictionarry/ER-NeRF)
|
||||||
|
|
||||||
```python
|
```python
|
||||||
## 注意,核心文件需要单独训练
|
|
||||||
.
|
.
|
||||||
├── data
|
├── data
|
||||||
│ ├── kf.json
|
│ ├── data_kf.json
|
||||||
│ ├── pretrained
|
│ ├── pretrained
|
||||||
│ └── └── ngp_kg.pth
|
│ └── └── ngp_kg.pth
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Inference Speed
|
## TODO
|
||||||
|
- 添加chatgpt实现数字人对话
|
||||||
|
- 声音克隆
|
||||||
|
- 数字人静音时用一段视频代替
|
||||||
|
|
||||||
在台式机RTX A4000或笔记本RTX 3080ti的显卡(显存16G)上进行视频推理时,1s可以推理35~43帧,假如1s视频25帧,则1s可推理约1.5s视频。
|
如果本项目对你有帮助,帮忙点个star。也欢迎感兴趣的朋友一起来完善该项目。
|
||||||
|
Email: lipku@foxmail.com
|
||||||
# Acknowledgement
|
|
||||||
|
|
||||||
- The data pre-processing part is adapted from [AD-NeRF](https://github.com/YudongGuo/AD-NeRF).
|
|
||||||
- The NeRF framework is based on [torch-ngp](https://github.com/ashawkey/torch-ngp).
|
|
||||||
- The algorithm core come from [RAD-NeRF](https://github.com/ashawkey/RAD-NeRF).
|
|
||||||
- Usage example [Fay](https://github.com/TheRamU/Fay).
|
|
||||||
|
|
||||||
学术交流可发邮件到邮箱:waityousea@126.com
|
|
||||||
|
|
44
app.py
44
app.py
|
@ -7,10 +7,11 @@ import json
|
||||||
import gevent
|
import gevent
|
||||||
from gevent import pywsgi
|
from gevent import pywsgi
|
||||||
from geventwebsocket.handler import WebSocketHandler
|
from geventwebsocket.handler import WebSocketHandler
|
||||||
from tools import audio_pre_process, video_pre_process, generate_video,audio_process
|
|
||||||
import os
|
import os
|
||||||
import re
|
import re
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
from threading import Thread
|
||||||
|
import multiprocessing
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
from nerf_triplane.provider import NeRFDataset_Test
|
from nerf_triplane.provider import NeRFDataset_Test
|
||||||
|
@ -24,7 +25,6 @@ import edge_tts
|
||||||
|
|
||||||
app = Flask(__name__)
|
app = Flask(__name__)
|
||||||
sockets = Sockets(app)
|
sockets = Sockets(app)
|
||||||
video_list = []
|
|
||||||
global nerfreal
|
global nerfreal
|
||||||
|
|
||||||
|
|
||||||
|
@ -40,33 +40,15 @@ async def main(voicename: str, text: str, render):
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
def send_information(path, ws):
|
|
||||||
|
|
||||||
print('传输信息开始!')
|
|
||||||
#path = video_list[0]
|
|
||||||
''''''
|
|
||||||
with open(path, 'rb') as f:
|
|
||||||
video_data = base64.b64encode(f.read()).decode()
|
|
||||||
|
|
||||||
data = {
|
|
||||||
'video': 'data:video/mp4;base64,%s' % video_data,
|
|
||||||
}
|
|
||||||
json_data = json.dumps(data)
|
|
||||||
|
|
||||||
ws.send(json_data)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
def txt_to_audio(text_):
|
def txt_to_audio(text_):
|
||||||
audio_list = []
|
audio_list = []
|
||||||
#audio_path = 'data/audio/aud_0.wav'
|
#audio_path = 'data/audio/aud_0.wav'
|
||||||
voicename = "zh-CN-YunxiaNeural"
|
voicename = "zh-CN-YunxiaNeural"
|
||||||
# 让我们一起学习。必应由 AI 提供支持,因此可能出现意外和错误。请确保核对事实,并 共享反馈以便我们可以学习和改进!
|
|
||||||
text = text_
|
text = text_
|
||||||
asyncio.get_event_loop().run_until_complete(main(voicename,text,nerfreal))
|
asyncio.get_event_loop().run_until_complete(main(voicename,text,nerfreal))
|
||||||
#audio_process(audio_path)
|
#audio_process(audio_path)
|
||||||
|
|
||||||
@sockets.route('/dighuman')
|
@sockets.route('/humanecho')
|
||||||
def echo_socket(ws):
|
def echo_socket(ws):
|
||||||
# 获取WebSocket对象
|
# 获取WebSocket对象
|
||||||
#ws = request.environ.get('wsgi.websocket')
|
#ws = request.environ.get('wsgi.websocket')
|
||||||
|
@ -81,19 +63,12 @@ def echo_socket(ws):
|
||||||
message = ws.receive()
|
message = ws.receive()
|
||||||
|
|
||||||
if len(message)==0:
|
if len(message)==0:
|
||||||
|
|
||||||
return '输入信息为空'
|
return '输入信息为空'
|
||||||
else:
|
else:
|
||||||
txt_to_audio(message)
|
txt_to_audio(message)
|
||||||
audio_path = 'data/audio/aud_0.wav'
|
|
||||||
audio_path_eo = 'data/audio/aud_0_eo.npy'
|
|
||||||
video_path = 'data/video/results/ngp_0.mp4'
|
|
||||||
output_path = 'data/video/results/output_0.mp4'
|
|
||||||
generate_video(audio_path, audio_path_eo, video_path, output_path)
|
|
||||||
video_list.append(output_path)
|
|
||||||
send_information(output_path, ws)
|
|
||||||
|
|
||||||
|
|
||||||
|
def render():
|
||||||
|
nerfreal.render()
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
|
@ -242,12 +217,13 @@ if __name__ == '__main__':
|
||||||
|
|
||||||
# we still need test_loader to provide audio features for testing.
|
# we still need test_loader to provide audio features for testing.
|
||||||
nerfreal = NeRFReal(opt, trainer, test_loader)
|
nerfreal = NeRFReal(opt, trainer, test_loader)
|
||||||
txt_to_audio('我是中国人,我来自北京')
|
#txt_to_audio('我是中国人,我来自北京')
|
||||||
nerfreal.render()
|
rendthrd = Thread(target=render)
|
||||||
|
rendthrd.start()
|
||||||
|
|
||||||
#############################################################################
|
#############################################################################
|
||||||
|
print('start websocket server')
|
||||||
server = pywsgi.WSGIServer(('127.0.0.1', 8800), app, handler_class=WebSocketHandler)
|
server = pywsgi.WSGIServer(('0.0.0.0', 8000), app, handler_class=WebSocketHandler)
|
||||||
server.serve_forever()
|
server.serve_forever()
|
||||||
|
|
||||||
|
|
26
asrreal.py
26
asrreal.py
|
@ -8,6 +8,7 @@ import pyaudio
|
||||||
import soundfile as sf
|
import soundfile as sf
|
||||||
import resampy
|
import resampy
|
||||||
|
|
||||||
|
import queue
|
||||||
from queue import Queue
|
from queue import Queue
|
||||||
#from collections import deque
|
#from collections import deque
|
||||||
from threading import Thread, Event
|
from threading import Thread, Event
|
||||||
|
@ -318,9 +319,11 @@ class ASR:
|
||||||
return None
|
return None
|
||||||
|
|
||||||
else:
|
else:
|
||||||
|
try:
|
||||||
frame = self.queue.get()
|
frame = self.queue.get(block=False)
|
||||||
print(f'[INFO] get frame {frame.shape}')
|
print(f'[INFO] get frame {frame.shape}')
|
||||||
|
except queue.Empty:
|
||||||
|
frame = np.zeros(self.chunk, dtype=np.float32)
|
||||||
|
|
||||||
self.idx = self.idx + self.chunk
|
self.idx = self.idx + self.chunk
|
||||||
|
|
||||||
|
@ -380,10 +383,9 @@ class ASR:
|
||||||
|
|
||||||
def push_audio(self,buffer):
|
def push_audio(self,buffer):
|
||||||
print(f'[INFO] push_audio {len(buffer)}')
|
print(f'[INFO] push_audio {len(buffer)}')
|
||||||
self.input_stream.write(buffer)
|
if len(buffer)>0:
|
||||||
if len(buffer)<=0:
|
byte_stream=BytesIO(buffer)
|
||||||
self.input_stream.seek(0)
|
stream = self.create_bytes_stream(byte_stream)
|
||||||
stream = self.create_bytes_stream(self.input_stream)
|
|
||||||
streamlen = stream.shape[0]
|
streamlen = stream.shape[0]
|
||||||
idx=0
|
idx=0
|
||||||
while streamlen >= self.chunk:
|
while streamlen >= self.chunk:
|
||||||
|
@ -392,6 +394,18 @@ class ASR:
|
||||||
idx += self.chunk
|
idx += self.chunk
|
||||||
if streamlen>0:
|
if streamlen>0:
|
||||||
self.queue.put(stream[idx:])
|
self.queue.put(stream[idx:])
|
||||||
|
# self.input_stream.write(buffer)
|
||||||
|
# if len(buffer)<=0:
|
||||||
|
# self.input_stream.seek(0)
|
||||||
|
# stream = self.create_bytes_stream(self.input_stream)
|
||||||
|
# streamlen = stream.shape[0]
|
||||||
|
# idx=0
|
||||||
|
# while streamlen >= self.chunk:
|
||||||
|
# self.queue.put(stream[idx:idx+self.chunk])
|
||||||
|
# streamlen -= self.chunk
|
||||||
|
# idx += self.chunk
|
||||||
|
# if streamlen>0:
|
||||||
|
# self.queue.put(stream[idx:])
|
||||||
|
|
||||||
def get_audio_out(self):
|
def get_audio_out(self):
|
||||||
return self.output_queue.get()
|
return self.output_queue.get()
|
||||||
|
|
Binary file not shown.
After Width: | Height: | Size: 14 KiB |
|
@ -0,0 +1,62 @@
|
||||||
|
<!-- index.html -->
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<script type="text/javascript" src="mpegts-1.7.3.min.js"></script>
|
||||||
|
<script type="text/javascript" src="http://cdn.sockjs.org/sockjs-0.3.4.js"></script>
|
||||||
|
<script src="http://code.jquery.com/jquery-2.1.1.min.js"></script>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="container">
|
||||||
|
<h1>WebSocket Test</h1>
|
||||||
|
<form class="form-inline" id="echo-form">
|
||||||
|
<div class="form-group">
|
||||||
|
<p>input text</p>
|
||||||
|
|
||||||
|
<textarea cols="2" rows="3" style="width:600px;height:50px;" class="form-control" id="message">test</textarea>
|
||||||
|
</div>
|
||||||
|
<button type="submit" class="btn btn-default">Send</button>
|
||||||
|
</form>
|
||||||
|
<div id="log">
|
||||||
|
|
||||||
|
</div>
|
||||||
|
<video id="video_player" width="40%" autoplay controls></video>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
<script type="text/javascript" charset="utf-8">
|
||||||
|
|
||||||
|
$(document).ready(function() {
|
||||||
|
var ws = new WebSocket('ws://serverip:8000/humanecho');
|
||||||
|
//document.getElementsByTagName("video")[0].setAttribute("src", aa["video"]);
|
||||||
|
ws.onopen = function() {
|
||||||
|
console.log('Connected');
|
||||||
|
};
|
||||||
|
ws.onmessage = function(e) {
|
||||||
|
console.log('Received: ' + e.data);
|
||||||
|
data = e
|
||||||
|
var vid = JSON.parse(data.data);
|
||||||
|
console.log(typeof(vid),vid)
|
||||||
|
//document.getElementsByTagName("video")[0].setAttribute("src", vid["video"]);
|
||||||
|
|
||||||
|
};
|
||||||
|
ws.onclose = function(e) {
|
||||||
|
console.log('Closed');
|
||||||
|
};
|
||||||
|
|
||||||
|
flvPlayer = mpegts.createPlayer({type: 'flv', url: "http://serverip:8080/live/livestream.flv", isLive: true, enableStashBuffer: false});
|
||||||
|
flvPlayer.attachMediaElement(document.getElementById('video_player'));
|
||||||
|
flvPlayer.load();
|
||||||
|
flvPlayer.play();
|
||||||
|
|
||||||
|
$('#echo-form').on('submit', function(e) {
|
||||||
|
e.preventDefault();
|
||||||
|
var message = $('#message').val();
|
||||||
|
console.log('Sending: ' + message);
|
||||||
|
ws.send(message);
|
||||||
|
$('#message').val('');
|
||||||
|
});
|
||||||
|
});
|
||||||
|
</script>
|
||||||
|
</html>
|
File diff suppressed because one or more lines are too long
|
@ -144,13 +144,13 @@ class NeRFReal:
|
||||||
data['auds'] = self.asr.get_next_feat()
|
data['auds'] = self.asr.get_next_feat()
|
||||||
|
|
||||||
outputs = self.trainer.test_gui_with_data(data, self.W, self.H)
|
outputs = self.trainer.test_gui_with_data(data, self.W, self.H)
|
||||||
print(f'[INFO] outputs shape ',outputs['image'].shape)
|
#print(f'[INFO] outputs shape ',outputs['image'].shape)
|
||||||
image = (outputs['image'] * 255).astype(np.uint8)
|
image = (outputs['image'] * 255).astype(np.uint8)
|
||||||
self.streamer.stream_frame(image)
|
self.streamer.stream_frame(image)
|
||||||
#self.pipe.stdin.write(image.tostring())
|
#self.pipe.stdin.write(image.tostring())
|
||||||
for _ in range(2):
|
for _ in range(2):
|
||||||
frame = self.asr.get_audio_out()
|
frame = self.asr.get_audio_out()
|
||||||
print(f'[INFO] get_audio_out shape ',frame.shape)
|
#print(f'[INFO] get_audio_out shape ',frame.shape)
|
||||||
self.streamer.stream_frame_audio(frame)
|
self.streamer.stream_frame_audio(frame)
|
||||||
# frame = (frame * 32767).astype(np.int16).tobytes()
|
# frame = (frame * 32767).astype(np.int16).tobytes()
|
||||||
# self.fifo_audio.write(frame)
|
# self.fifo_audio.write(frame)
|
||||||
|
|
|
@ -12,6 +12,7 @@ rich
|
||||||
dearpygui
|
dearpygui
|
||||||
packaging
|
packaging
|
||||||
scipy
|
scipy
|
||||||
|
scikit-learn
|
||||||
|
|
||||||
face_alignment
|
face_alignment
|
||||||
python_speech_features
|
python_speech_features
|
||||||
|
@ -24,3 +25,8 @@ configargparse
|
||||||
|
|
||||||
lpips
|
lpips
|
||||||
imageio-ffmpeg
|
imageio-ffmpeg
|
||||||
|
|
||||||
|
transformers
|
||||||
|
edge_tts
|
||||||
|
flask
|
||||||
|
flask_sockets
|
||||||
|
|
Loading…
Reference in New Issue