update readme

2024-01-13 20:12:08 +08:00 · 2024-01-13 20:12:08 +08:00 · 5389761b06
parent 49d7446933
commit 5389761b06
1 changed files with 25 additions and 19 deletions
--- a/README.md
+++ b/README.md
@ -3,11 +3,11 @@ A streaming digital human based on the Ernerf model， realize audio video synch

 [![Watch the video]](/assets/demo.mp4)

-## Installation
+## 1. Installation

-Tested on Ubuntu 18.04, Python3.10, Pytorch 1.12 and CUDA 11.3
+Tested on Ubuntu 20.04, Python3.10, Pytorch 1.12 and CUDA 11.3

-### Install dependency
+### 1.1 Install dependency

 ```bash
 conda create -n nerfstream python=3.10
@ -23,14 +23,14 @@ linux cuda环境搭建可以参考这篇文章 https://zhuanlan.zhihu.com/p/6749
 参照 https://github.com/lipku/python_rtmpstream


-## Run
+## 2. Run

-### 运行rtmpserver (srs)
+### 2.1 运行rtmpserver (srs)
 ```
 docker run --rm -it -p 1935:1935 -p 1985:1985 -p 8080:8080 registry.cn-hangzhou.aliyuncs.com/ossrs/srs:5
 ```

-### 启动：
+### 2.2 启动数字人：

 ```python
 python app.py
@ -43,7 +43,7 @@ export HF_ENDPOINT=https://hf-mirror.com

 运行成功后，用vlc访问rtmp://serverip/live/livestream

-### 网页端数字人播报输入文字
+### 2.3 网页端数字人播报输入文字
 安装并启动nginx
 ```
 apt install nginx
@ -52,24 +52,20 @@ nginx
 修改echo.html中websocket和视频播放地址，将serverip替换成实际服务器ip  
 然后将echo.html和mpegts-1.7.3.min.js拷到/var/www/html下

-启动数字人
-```python
-python app.py
-```

-用浏览器打开http://serverip/echo.html，在文本框输入任意文字，提交。数字人播报该段文字
+用浏览器打开http://serverip/echo.html, 在文本框输入任意文字，提交。数字人播报该段文字

-### docker运行
-先运行srs和nginx  
-启动数字人
+## 3. Docker Run  
+不需要第1步的安装，直接运行。
 ```
-docker run --gpus all -it --network=host --rm  registry.cn-hangzhou.aliyuncs.com/lipku/nerfstream:v1.2
+docker run --gpus all -it --network=host --rm  registry.cn-hangzhou.aliyuncs.com/lipku/nerfstream:v1.3
 ```
+srs和nginx的运行同2.1和2.3

-## Data flow
+## 4. Data flow
 ![](/assets/dataflow.png)

-## 数字人模型文件
+## 5. 数字人模型文件
 可以替换成自己训练的模型(https://github.com/Fictionarry/ER-NeRF)
 ```python
 .
@ -80,7 +76,17 @@ docker run --gpus all -it --network=host --rm  registry.cn-hangzhou.aliyuncs.com

 ```

-## TODO
+## 6. 性能分析
+1. 帧率  
+在Tesla T4显卡上测试整体fps为18左右，如果去掉音视频编码推流，帧率在20左右。用4090显卡应该能达到25帧，欢迎有显卡资源的同学提供数据。  
+优化：新开一个线程运行音视频编码推流  
+2. 延时  
+整体延时5s多  
+（1）tts延时2s左右，目前用的edgetts，需要将每句话转完后一次性输入，可以优化tts改成流式输入  
+（2）wav2vec延时1s多，需要缓存50帧音频做计算，可以通过-m设置context_size来减少延时  
+（3）srs转发延时，设置srs服务器减少缓冲延时
+
+## 7. TODO
 - 添加chatgpt实现数字人对话
 - 声音克隆
 - 数字人静音时用一段视频代替