完善数字人播报
This commit is contained in:
parent
f0b46ddb8e
commit
bdd4be3919
140
README.md
140
README.md
|
@ -1,121 +1,79 @@
|
|||
# 虚拟人说话头生成(照片虚拟人实时驱动)
|
||||
![](/img/example.gif)
|
||||
# Get Started
|
||||
A streaming digital human based on the Ernerf model, realize audio video synchronous dialogue. It can basically achieve commercial effects.
|
||||
基于ernerf模型的流式数字人,实现音视频同步对话。基本可以达到商用效果
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
Tested on Ubuntu 22.04, Pytorch 1.12 and CUDA 11.6,or Pytorch 1.12 and CUDA 11.3
|
||||
|
||||
```python
|
||||
git clone https://github.com/waityousea/xuniren.git
|
||||
cd xuniren
|
||||
```
|
||||
Tested on Ubuntu 18.04, Pytorch 1.12 and CUDA 11.3.
|
||||
|
||||
### Install dependency
|
||||
|
||||
```python
|
||||
# for ubuntu, portaudio is needed for pyaudio to work.
|
||||
sudo apt install portaudio19-dev
|
||||
|
||||
```bash
|
||||
pip install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
|
||||
pip install -r requirements.txt
|
||||
or
|
||||
## environment.yml中的pytorch使用的1.12和cuda 11.3
|
||||
conda env create -f environment.yml
|
||||
## install pytorch3d
|
||||
#ubuntu/mac
|
||||
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
|
||||
pip install tensorflow-gpu==2.8.0
|
||||
```
|
||||
linux cuda环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/674972886
|
||||
|
||||
安装rtmpstream库
|
||||
参照 https://github.com/lipku/python_rtmpstream
|
||||
|
||||
|
||||
## Run
|
||||
|
||||
### 运行rtmpserver (srs)
|
||||
```
|
||||
docker run --rm -it -p 1935:1935 -p 1985:1985 -p 8080:8080 registry.cn-hangzhou.aliyuncs.com/ossrs/srs:5
|
||||
```
|
||||
|
||||
**windows安装pytorch3d**
|
||||
|
||||
- gcc & g++ ≥ 4.9
|
||||
|
||||
在windows中,需要安装gcc编译器,可以根据需求自行安装,例如采用MinGW
|
||||
|
||||
以下安装步骤来自于[pytorch3d](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md)官方, 可以根据需求进行选择。
|
||||
|
||||
```python
|
||||
conda create -n pytorch3d python=3.9
|
||||
conda activate pytorch3d
|
||||
conda install pytorch=1.13.0 torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
|
||||
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
|
||||
```
|
||||
|
||||
对于 CUB 构建时间依赖项,仅当您的 CUDA 早于 11.7 时才需要,如果您使用的是 conda,则可以继续
|
||||
|
||||
```
|
||||
conda install -c bottler nvidiacub
|
||||
```
|
||||
|
||||
```
|
||||
# Demos and examples
|
||||
conda install jupyter
|
||||
pip install scikit-image matplotlib imageio plotly opencv-python
|
||||
|
||||
# Tests/Linting
|
||||
pip install black usort flake8 flake8-bugbear flake8-comprehensions
|
||||
```
|
||||
|
||||
任何必要的补丁后,你可以去“x64 Native Tools Command Prompt for VS 2019”编译安装
|
||||
|
||||
```
|
||||
git clone https://github.com/facebookresearch/pytorch3d.git
|
||||
cd pytorch3d
|
||||
python setup.py install
|
||||
```
|
||||
|
||||
### Build extension
|
||||
|
||||
By default, we use [`load`](https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.load) to build the extension at runtime. However, this may be inconvenient sometimes. Therefore, we also provide the `setup.py` to build each extension:
|
||||
|
||||
```
|
||||
# install all extension modules
|
||||
# notice: 该模块必须安装。
|
||||
# 在windows下,建议采用vs2019的x64 Native Tools Command Prompt for VS 2019命令窗口安装
|
||||
bash scripts/install_ext.sh
|
||||
```
|
||||
|
||||
### **start(独立运行)**
|
||||
|
||||
环境配置完成后,启动虚拟人生成器:
|
||||
### 环境配置完成后,启动:
|
||||
|
||||
```python
|
||||
python app.py
|
||||
```
|
||||
### **start(对接fay,在ubuntu 20下完成测试)**
|
||||
环境配置完成后,启动fay对接脚本
|
||||
```python
|
||||
python fay_connect.py
|
||||
|
||||
如果访问不了huggingface,在运行前
|
||||
```
|
||||
export HF_ENDPOINT=https://hf-mirror.com
|
||||
```
|
||||
![](img/weplay.png)
|
||||
|
||||
扫码支助开源开发工作,凭支付单号入qq交流群
|
||||
运行成功后,用vlc访问rtmp://serverip/live/livestream
|
||||
|
||||
### 网页端数字人播报输入文字
|
||||
安装并启动nginx
|
||||
```
|
||||
apt install nginx
|
||||
nginx
|
||||
```
|
||||
修改echo.html中websocket和视频播放地址,将serverip替换成实际服务器ip
|
||||
然后将echo.html和mpegts-1.7.3.min.js拷到/var/www/html下
|
||||
|
||||
启动数字人
|
||||
```python
|
||||
python app.py
|
||||
```
|
||||
|
||||
接口的输入与输出信息 [Websoket.md](https://github.com/waityousea/xuniren/blob/main/WebSocket.md)
|
||||
用浏览器打开http://serverip/echo.html,在文本框输入任意文字,提交。数字人播报该段文字
|
||||
|
||||
虚拟人生成的核心文件
|
||||
## Data flow
|
||||
![](/assets/dataflow.png)
|
||||
|
||||
## 数字人模型文件,可以替换成自己训练的模型(https://github.com/Fictionarry/ER-NeRF)
|
||||
|
||||
```python
|
||||
## 注意,核心文件需要单独训练
|
||||
.
|
||||
├── data
|
||||
│ ├── kf.json
|
||||
│ ├── data_kf.json
|
||||
│ ├── pretrained
|
||||
│ └── └── ngp_kg.pth
|
||||
|
||||
```
|
||||
|
||||
### Inference Speed
|
||||
## TODO
|
||||
- 添加chatgpt实现数字人对话
|
||||
- 声音克隆
|
||||
- 数字人静音时用一段视频代替
|
||||
|
||||
在台式机RTX A4000或笔记本RTX 3080ti的显卡(显存16G)上进行视频推理时,1s可以推理35~43帧,假如1s视频25帧,则1s可推理约1.5s视频。
|
||||
|
||||
# Acknowledgement
|
||||
|
||||
- The data pre-processing part is adapted from [AD-NeRF](https://github.com/YudongGuo/AD-NeRF).
|
||||
- The NeRF framework is based on [torch-ngp](https://github.com/ashawkey/torch-ngp).
|
||||
- The algorithm core come from [RAD-NeRF](https://github.com/ashawkey/RAD-NeRF).
|
||||
- Usage example [Fay](https://github.com/TheRamU/Fay).
|
||||
|
||||
学术交流可发邮件到邮箱:waityousea@126.com
|
||||
如果本项目对你有帮助,帮忙点个star。也欢迎感兴趣的朋友一起来完善该项目。
|
||||
Email: lipku@foxmail.com
|
||||
|
|
44
app.py
44
app.py
|
@ -7,10 +7,11 @@ import json
|
|||
import gevent
|
||||
from gevent import pywsgi
|
||||
from geventwebsocket.handler import WebSocketHandler
|
||||
from tools import audio_pre_process, video_pre_process, generate_video,audio_process
|
||||
import os
|
||||
import re
|
||||
import numpy as np
|
||||
from threading import Thread
|
||||
import multiprocessing
|
||||
|
||||
import argparse
|
||||
from nerf_triplane.provider import NeRFDataset_Test
|
||||
|
@ -24,7 +25,6 @@ import edge_tts
|
|||
|
||||
app = Flask(__name__)
|
||||
sockets = Sockets(app)
|
||||
video_list = []
|
||||
global nerfreal
|
||||
|
||||
|
||||
|
@ -40,33 +40,15 @@ async def main(voicename: str, text: str, render):
|
|||
pass
|
||||
|
||||
|
||||
def send_information(path, ws):
|
||||
|
||||
print('传输信息开始!')
|
||||
#path = video_list[0]
|
||||
''''''
|
||||
with open(path, 'rb') as f:
|
||||
video_data = base64.b64encode(f.read()).decode()
|
||||
|
||||
data = {
|
||||
'video': 'data:video/mp4;base64,%s' % video_data,
|
||||
}
|
||||
json_data = json.dumps(data)
|
||||
|
||||
ws.send(json_data)
|
||||
|
||||
|
||||
|
||||
def txt_to_audio(text_):
|
||||
audio_list = []
|
||||
#audio_path = 'data/audio/aud_0.wav'
|
||||
voicename = "zh-CN-YunxiaNeural"
|
||||
# 让我们一起学习。必应由 AI 提供支持,因此可能出现意外和错误。请确保核对事实,并 共享反馈以便我们可以学习和改进!
|
||||
text = text_
|
||||
asyncio.get_event_loop().run_until_complete(main(voicename,text,nerfreal))
|
||||
#audio_process(audio_path)
|
||||
|
||||
@sockets.route('/dighuman')
|
||||
@sockets.route('/humanecho')
|
||||
def echo_socket(ws):
|
||||
# 获取WebSocket对象
|
||||
#ws = request.environ.get('wsgi.websocket')
|
||||
|
@ -81,19 +63,12 @@ def echo_socket(ws):
|
|||
message = ws.receive()
|
||||
|
||||
if len(message)==0:
|
||||
|
||||
return '输入信息为空'
|
||||
else:
|
||||
txt_to_audio(message)
|
||||
audio_path = 'data/audio/aud_0.wav'
|
||||
audio_path_eo = 'data/audio/aud_0_eo.npy'
|
||||
video_path = 'data/video/results/ngp_0.mp4'
|
||||
output_path = 'data/video/results/output_0.mp4'
|
||||
generate_video(audio_path, audio_path_eo, video_path, output_path)
|
||||
video_list.append(output_path)
|
||||
send_information(output_path, ws)
|
||||
|
||||
|
||||
def render():
|
||||
nerfreal.render()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
@ -242,12 +217,13 @@ if __name__ == '__main__':
|
|||
|
||||
# we still need test_loader to provide audio features for testing.
|
||||
nerfreal = NeRFReal(opt, trainer, test_loader)
|
||||
txt_to_audio('我是中国人,我来自北京')
|
||||
nerfreal.render()
|
||||
#txt_to_audio('我是中国人,我来自北京')
|
||||
rendthrd = Thread(target=render)
|
||||
rendthrd.start()
|
||||
|
||||
#############################################################################
|
||||
|
||||
server = pywsgi.WSGIServer(('127.0.0.1', 8800), app, handler_class=WebSocketHandler)
|
||||
print('start websocket server')
|
||||
server = pywsgi.WSGIServer(('0.0.0.0', 8000), app, handler_class=WebSocketHandler)
|
||||
server.serve_forever()
|
||||
|
||||
|
26
asrreal.py
26
asrreal.py
|
@ -8,6 +8,7 @@ import pyaudio
|
|||
import soundfile as sf
|
||||
import resampy
|
||||
|
||||
import queue
|
||||
from queue import Queue
|
||||
#from collections import deque
|
||||
from threading import Thread, Event
|
||||
|
@ -318,9 +319,11 @@ class ASR:
|
|||
return None
|
||||
|
||||
else:
|
||||
|
||||
frame = self.queue.get()
|
||||
try:
|
||||
frame = self.queue.get(block=False)
|
||||
print(f'[INFO] get frame {frame.shape}')
|
||||
except queue.Empty:
|
||||
frame = np.zeros(self.chunk, dtype=np.float32)
|
||||
|
||||
self.idx = self.idx + self.chunk
|
||||
|
||||
|
@ -380,10 +383,9 @@ class ASR:
|
|||
|
||||
def push_audio(self,buffer):
|
||||
print(f'[INFO] push_audio {len(buffer)}')
|
||||
self.input_stream.write(buffer)
|
||||
if len(buffer)<=0:
|
||||
self.input_stream.seek(0)
|
||||
stream = self.create_bytes_stream(self.input_stream)
|
||||
if len(buffer)>0:
|
||||
byte_stream=BytesIO(buffer)
|
||||
stream = self.create_bytes_stream(byte_stream)
|
||||
streamlen = stream.shape[0]
|
||||
idx=0
|
||||
while streamlen >= self.chunk:
|
||||
|
@ -392,6 +394,18 @@ class ASR:
|
|||
idx += self.chunk
|
||||
if streamlen>0:
|
||||
self.queue.put(stream[idx:])
|
||||
# self.input_stream.write(buffer)
|
||||
# if len(buffer)<=0:
|
||||
# self.input_stream.seek(0)
|
||||
# stream = self.create_bytes_stream(self.input_stream)
|
||||
# streamlen = stream.shape[0]
|
||||
# idx=0
|
||||
# while streamlen >= self.chunk:
|
||||
# self.queue.put(stream[idx:idx+self.chunk])
|
||||
# streamlen -= self.chunk
|
||||
# idx += self.chunk
|
||||
# if streamlen>0:
|
||||
# self.queue.put(stream[idx:])
|
||||
|
||||
def get_audio_out(self):
|
||||
return self.output_queue.get()
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 14 KiB |
|
@ -0,0 +1,62 @@
|
|||
<!-- index.html -->
|
||||
<html>
|
||||
<head>
|
||||
<script type="text/javascript" src="mpegts-1.7.3.min.js"></script>
|
||||
<script type="text/javascript" src="http://cdn.sockjs.org/sockjs-0.3.4.js"></script>
|
||||
<script src="http://code.jquery.com/jquery-2.1.1.min.js"></script>
|
||||
|
||||
|
||||
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<h1>WebSocket Test</h1>
|
||||
<form class="form-inline" id="echo-form">
|
||||
<div class="form-group">
|
||||
<p>input text</p>
|
||||
|
||||
<textarea cols="2" rows="3" style="width:600px;height:50px;" class="form-control" id="message">test</textarea>
|
||||
</div>
|
||||
<button type="submit" class="btn btn-default">Send</button>
|
||||
</form>
|
||||
<div id="log">
|
||||
|
||||
</div>
|
||||
<video id="video_player" width="40%" autoplay controls></video>
|
||||
</div>
|
||||
</body>
|
||||
<script type="text/javascript" charset="utf-8">
|
||||
|
||||
$(document).ready(function() {
|
||||
var ws = new WebSocket('ws://serverip:8000/humanecho');
|
||||
//document.getElementsByTagName("video")[0].setAttribute("src", aa["video"]);
|
||||
ws.onopen = function() {
|
||||
console.log('Connected');
|
||||
};
|
||||
ws.onmessage = function(e) {
|
||||
console.log('Received: ' + e.data);
|
||||
data = e
|
||||
var vid = JSON.parse(data.data);
|
||||
console.log(typeof(vid),vid)
|
||||
//document.getElementsByTagName("video")[0].setAttribute("src", vid["video"]);
|
||||
|
||||
};
|
||||
ws.onclose = function(e) {
|
||||
console.log('Closed');
|
||||
};
|
||||
|
||||
flvPlayer = mpegts.createPlayer({type: 'flv', url: "http://serverip:8080/live/livestream.flv", isLive: true, enableStashBuffer: false});
|
||||
flvPlayer.attachMediaElement(document.getElementById('video_player'));
|
||||
flvPlayer.load();
|
||||
flvPlayer.play();
|
||||
|
||||
$('#echo-form').on('submit', function(e) {
|
||||
e.preventDefault();
|
||||
var message = $('#message').val();
|
||||
console.log('Sending: ' + message);
|
||||
ws.send(message);
|
||||
$('#message').val('');
|
||||
});
|
||||
});
|
||||
</script>
|
||||
</html>
|
File diff suppressed because one or more lines are too long
|
@ -144,13 +144,13 @@ class NeRFReal:
|
|||
data['auds'] = self.asr.get_next_feat()
|
||||
|
||||
outputs = self.trainer.test_gui_with_data(data, self.W, self.H)
|
||||
print(f'[INFO] outputs shape ',outputs['image'].shape)
|
||||
#print(f'[INFO] outputs shape ',outputs['image'].shape)
|
||||
image = (outputs['image'] * 255).astype(np.uint8)
|
||||
self.streamer.stream_frame(image)
|
||||
#self.pipe.stdin.write(image.tostring())
|
||||
for _ in range(2):
|
||||
frame = self.asr.get_audio_out()
|
||||
print(f'[INFO] get_audio_out shape ',frame.shape)
|
||||
#print(f'[INFO] get_audio_out shape ',frame.shape)
|
||||
self.streamer.stream_frame_audio(frame)
|
||||
# frame = (frame * 32767).astype(np.int16).tobytes()
|
||||
# self.fifo_audio.write(frame)
|
||||
|
|
|
@ -12,6 +12,7 @@ rich
|
|||
dearpygui
|
||||
packaging
|
||||
scipy
|
||||
scikit-learn
|
||||
|
||||
face_alignment
|
||||
python_speech_features
|
||||
|
@ -24,3 +25,8 @@ configargparse
|
|||
|
||||
lpips
|
||||
imageio-ffmpeg
|
||||
|
||||
transformers
|
||||
edge_tts
|
||||
flask
|
||||
flask_sockets
|
||||
|
|
Loading…
Reference in New Issue