fix customvideo

This commit is contained in:
lipku 2024-08-03 12:58:49 +08:00
parent 391512f68c
commit a9e9cfb220
9 changed files with 221 additions and 150 deletions

169
README.md
View File

@ -57,54 +57,37 @@ export HF_ENDPOINT=https://hf-mirror.com
备注:服务端需要开放端口 tcp:8000,8010,1985; udp:8000 备注:服务端需要开放端口 tcp:8000,8010,1985; udp:8000
## 3. More Usage ## 3. More Usage
### 3.1 使用LLM模型进行数字人对话 分别选择数字人模型、传输方式、tts模型
目前借鉴数字人对话系统[LinlyTalker](https://github.com/Kedreamix/Linly-Talker)的方式LLM模型支持Chatgpt,Qwen和GeminiPro。需要在app.py中填入自己的api_key。 ### 3.1 数字人模型
支持3种模型ernerf、musetalk、wav2lip默认用ernerf
用浏览器打开http://serverip:8010/rtcpushchat.html #### 3.1.1 ER-Nerf
### 3.2 声音克隆
可以任意选用下面两种服务推荐用gpt-sovits
#### 3.2.1 gpt-sovits
服务部署参照[gpt-sovits](/tts/README.md)
运行
``` ```
python app.py --tts gpt-sovits --TTS_SERVER http://127.0.0.1:9880 --REF_FILE data/ref.wav --REF_TEXT xxx python app.py --model ernerf
``` ```
REF_TEXT为REF_FILE中语音内容时长不宜过长 支持如下参数配置
##### 3.1.1.1 音频特征用hubert
#### 3.2.2 xtts 默认用的wav2lip如果训练模型时用的hubert提取音频特征用如下命令启动数字人
运行xtts服务参照 https://github.com/coqui-ai/xtts-streaming-server
```
docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 9000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest
```
然后运行其中ref.wav为需要克隆的声音文件
```
python app.py --tts xtts --REF_FILE data/ref.wav --TTS_SERVER http://localhost:9000
```
### 3.3 音频特征用hubert
如果训练模型时用的hubert提取音频特征用如下命令启动数字人
``` ```
python app.py --asr_model facebook/hubert-large-ls960-ft python app.py --asr_model facebook/hubert-large-ls960-ft
``` ```
### 3.4 设置背景图片 ##### 3.1.1.2 设置头部背景图片
``` ```
python app.py --bg_img bc.jpg python app.py --bg_img bc.jpg
``` ```
### 3.5 全身视频拼接 ##### 3.1.1.3 全身视频贴回
#### 3.5.1 切割训练用的视频 - 1.切割训练用的视频
``` ```
ffmpeg -i fullbody.mp4 -vf crop="400:400:100:5" train.mp4  ffmpeg -i fullbody.mp4 -vf crop="400:400:100:5" train.mp4 
``` ```
用train.mp4训练模型 用train.mp4训练模型
#### 3.5.2 提取全身图片 - 2.提取全身图片
``` ```
ffmpeg -i fullbody.mp4 -vf fps=25 -qmin 1 -q:v 1 -start_number 0 data/fullbody/img/%d.jpg ffmpeg -i fullbody.mp4 -vf fps=25 -qmin 1 -q:v 1 -start_number 0 data/fullbody/img/%d.jpg
``` ```
#### 3.5.2 启动数字人 - 3.启动数字人
``` ```
python app.py --fullbody --fullbody_img data/fullbody/img --fullbody_offset_x 100 --fullbody_offset_y 5 --fullbody_width 580 --fullbody_height 1080 --W 400 --H 400 python app.py --fullbody --fullbody_img data/fullbody/img --fullbody_offset_x 100 --fullbody_offset_y 5 --fullbody_width 580 --fullbody_height 1080 --W 400 --H 400
``` ```
@ -112,39 +95,7 @@ python app.py --fullbody --fullbody_img data/fullbody/img --fullbody_offset_x 10
- --W、--H 训练视频的宽、高 - --W、--H 训练视频的宽、高
- ernerf训练第三步torso如果训练的不好在拼接处会有接缝。可以在上面的命令加上--torso_imgs data/xxx/torso_imgstorso不用模型推理直接用训练数据集里的torso图片。这种方式可能头颈处会有些人工痕迹。 - ernerf训练第三步torso如果训练的不好在拼接处会有接缝。可以在上面的命令加上--torso_imgs data/xxx/torso_imgstorso不用模型推理直接用训练数据集里的torso图片。这种方式可能头颈处会有些人工痕迹。
### 3.6 不说话时用自定义视频替代 #### 3.1.2 模型用musetalk
- 提取自定义视频图片
```
ffmpeg -i silence.mp4 -vf fps=25 -qmin 1 -q:v 1 -start_number 0 data/customvideo/img/%d.png
```
- 运行数字人
```
python app.py --customvideo --customvideo_img data/customvideo/img --customvideo_imgnum 100
```
### 3.7 webrtc p2p
此种模式不需要srs
```
python app.py --transport webrtc
```
服务端需要开放端口 tcp:8010; udp:50000~60000
用浏览器打开http://serverip:8010/webrtcapi.html
### 3.8 rtmp推送到srs
- 安装rtmpstream库
参照 https://github.com/lipku/python_rtmpstream
- 启动srs
```
docker run --rm -it -p 1935:1935 -p 1985:1985 -p 8080:8080 registry.cn-hangzhou.aliyuncs.com/ossrs/srs:5
```
- 运行数字人
```python
python app.py --transport rtmp --push_url 'rtmp://localhost/live/livestream'
```
用浏览器打开http://serverip:8010/echoapi.html
### 3.9 模型用musetalk
暂不支持rtmp推送 暂不支持rtmp推送
- 安装依赖库 - 安装依赖库
```bash ```bash
@ -163,7 +114,7 @@ mim install "mmpose>=1.1.0"
python app.py --model musetalk --transport webrtc python app.py --model musetalk --transport webrtc
用浏览器打开http://serverip:8010/webrtcapi.html 用浏览器打开http://serverip:8010/webrtcapi.html
可以设置--batch_size 提高显卡利用率,设置--avatar_id 运行不同的数字人 可以设置--batch_size 提高显卡利用率,设置--avatar_id 运行不同的数字人
#### 替换成自己的数字人 ##### 替换成自己的数字人
```bash ```bash
git clone https://github.com/TMElyralab/MuseTalk.git git clone https://github.com/TMElyralab/MuseTalk.git
cd MuseTalk cd MuseTalk
@ -177,7 +128,7 @@ python simple_musetalk.py --avatar_id 4 --file D:\\ok\\test.mp4
支持视频和图片生成 会自动生成到data的avatars目录下 支持视频和图片生成 会自动生成到data的avatars目录下
``` ```
### 3.10 模型用wav2lip #### 3.1.3 模型用wav2lip
暂不支持rtmp推送 暂不支持rtmp推送
- 下载模型 - 下载模型
下载wav2lip运行需要的模型链接: https://pan.baidu.com/s/1yOsQ06-RIDTJd3HFCw4wtA 密码: ltua 下载wav2lip运行需要的模型链接: https://pan.baidu.com/s/1yOsQ06-RIDTJd3HFCw4wtA 密码: ltua
@ -187,12 +138,96 @@ python simple_musetalk.py --avatar_id 4 --file D:\\ok\\test.mp4
python app.py --transport webrtc --model wav2lip --avatar_id wav2lip_avatar1 python app.py --transport webrtc --model wav2lip --avatar_id wav2lip_avatar1
用浏览器打开http://serverip:8010/webrtcapi.html 用浏览器打开http://serverip:8010/webrtcapi.html
可以设置--batch_size 提高显卡利用率,设置--avatar_id 运行不同的数字人 可以设置--batch_size 提高显卡利用率,设置--avatar_id 运行不同的数字人
#### 替换成自己的数字人 ##### 替换成自己的数字人
```bash ```bash
cd wav2lip cd wav2lip
python genavatar.py --video_path xxx.mp4 python genavatar.py --video_path xxx.mp4
运行后将results/avatars下文件拷到本项目的data/avatars下 运行后将results/avatars下文件拷到本项目的data/avatars下
``` ```
### 3.2 传输模式
支持webrtc、rtcpush、rtmp默认用rtcpush
#### 3.2.1 webrtc p2p
此种模式不需要srs
```
python app.py --transport webrtc
```
服务端需要开放端口 tcp:8010; udp:50000~60000
用浏览器打开http://serverip:8010/webrtcapi.html
#### 3.2.2 webrtc推送到srs
- 启动srs
```
export CANDIDATE='<服务器外网ip>'
docker run --rm --env CANDIDATE=$CANDIDATE \
-p 1935:1935 -p 8080:8080 -p 1985:1985 -p 8000:8000/udp \
registry.cn-hangzhou.aliyuncs.com/ossrs/srs:5 \
objs/srs -c conf/rtc.conf
```
- 运行数字人
```python
python app.py --transport rtcpush --push_url 'http://localhost:1985/rtc/v1/whip/?app=live&stream=livestream'
```
用浏览器打开http://serverip:8010/rtcpushapi.html
#### 3.2.3 rtmp推送到srs
- 安装rtmpstream库
参照 https://github.com/lipku/python_rtmpstream
- 启动srs
```
docker run --rm -it -p 1935:1935 -p 1985:1985 -p 8080:8080 registry.cn-hangzhou.aliyuncs.com/ossrs/srs:5
```
- 运行数字人
```python
python app.py --transport rtmp --push_url 'rtmp://localhost/live/livestream'
```
用浏览器打开http://serverip:8010/echoapi.html
### 3.3 TTS模型
支持edgetts、gpt-sovits、xtts默认用edgetts
#### 3.3.1 gpt-sovits
服务部署参照[gpt-sovits](/tts/README.md)
运行
```
python app.py --tts gpt-sovits --TTS_SERVER http://127.0.0.1:9880 --REF_FILE data/ref.wav --REF_TEXT xxx
```
REF_TEXT为REF_FILE中语音内容时长不宜过长
#### 3.3.2 xtts
运行xtts服务参照 https://github.com/coqui-ai/xtts-streaming-server
```
docker run --gpus=all -e COQUI_TOS_AGREED=1 --rm -p 9000:80 ghcr.io/coqui-ai/xtts-streaming-server:latest
```
然后运行其中ref.wav为需要克隆的声音文件
```
python app.py --tts xtts --REF_FILE data/ref.wav --TTS_SERVER http://localhost:9000
```
### 3.4 视频编排
- 1生成素材
```
ffmpeg -i xxx.mp4 -s 576x768 -vf fps=25 -qmin 1 -q:v 1 -start_number 0 data/customvideo/image/%08d.png
ffmpeg -i xxx.mp4 -vn -acodec pcm_s16le -ac 1 -ar 16000 data/customvideo/audio.wav
```
其中-s与输出视频大小一致
- 2编辑data/custom_config.json
指定imgpath和audiopath。
设置audiotype说明0表示推理视频不用设置1表示静音视频如果不设置默认用推理视频代替; 2以上自定义配置
- 3运行
```
python app.py --transport webrtc --customvideo_config data/custom_config.json
```
- 4打开http://<serverip>:8010/webrtcapi-custom.html
填写custom_config.json中配置的audiotype点击切换视频
### 3.5 使用LLM模型进行数字人对话
目前借鉴数字人对话系统[LinlyTalker](https://github.com/Kedreamix/Linly-Talker)的方式LLM模型支持Chatgpt,Qwen和GeminiPro。需要在app.py中填入自己的api_key。
用浏览器打开http://serverip:8010/rtcpushchat.html
## 4. Docker Run ## 4. Docker Run
不需要前面的安装,直接运行。 不需要前面的安装,直接运行。
@ -206,7 +241,7 @@ https://www.codewithgpu.com/i/lipku/metahuman-stream/base
[autodl教程](autodl/README.md) [autodl教程](autodl/README.md)
## 5. 数字人模型文件 ## 5. ernerf数字人模型文件
可以替换成自己训练的模型(https://github.com/Fictionarry/ER-NeRF) 可以替换成自己训练的模型(https://github.com/Fictionarry/ER-NeRF)
```python ```python
. .

6
app.py
View File

@ -316,9 +316,9 @@ if __name__ == '__main__':
parser.add_argument('--bbox_shift', type=int, default=5) parser.add_argument('--bbox_shift', type=int, default=5)
parser.add_argument('--batch_size', type=int, default=16) parser.add_argument('--batch_size', type=int, default=16)
parser.add_argument('--customvideo', action='store_true', help="custom video") # parser.add_argument('--customvideo', action='store_true', help="custom video")
parser.add_argument('--customvideo_img', type=str, default='data/customvideo/img') # parser.add_argument('--customvideo_img', type=str, default='data/customvideo/img')
parser.add_argument('--customvideo_imgnum', type=int, default=1) # parser.add_argument('--customvideo_imgnum', type=int, default=1)
parser.add_argument('--customvideo_config', type=str, default='') parser.add_argument('--customvideo_config', type=str, default='')

View File

@ -15,6 +15,8 @@ from threading import Thread, Event
from io import BytesIO from io import BytesIO
import soundfile as sf import soundfile as sf
from ttsreal import EdgeTTS,VoitsTTS,XTTS
from tqdm import tqdm from tqdm import tqdm
def read_imgs(img_list): def read_imgs(img_list):
frames = [] frames = []
@ -30,6 +32,13 @@ class BaseReal:
self.sample_rate = 16000 self.sample_rate = 16000
self.chunk = self.sample_rate // opt.fps # 320 samples per chunk (20ms * 16000 / 1000) self.chunk = self.sample_rate // opt.fps # 320 samples per chunk (20ms * 16000 / 1000)
if opt.tts == "edgetts":
self.tts = EdgeTTS(opt,self)
elif opt.tts == "gpt-sovits":
self.tts = VoitsTTS(opt,self)
elif opt.tts == "xtts":
self.tts = XTTS(opt,self)
self.curr_state=0 self.curr_state=0
self.custom_img_cycle = {} self.custom_img_cycle = {}
self.custom_audio_cycle = {} self.custom_audio_cycle = {}
@ -48,7 +57,14 @@ class BaseReal:
self.custom_audio_index[item['audiotype']] = 0 self.custom_audio_index[item['audiotype']] = 0
self.custom_index[item['audiotype']] = 0 self.custom_index[item['audiotype']] = 0
self.custom_opt[item['audiotype']] = item self.custom_opt[item['audiotype']] = item
def init_customindex(self):
self.curr_state=0
for key in self.custom_audio_index:
self.custom_audio_index[key]=0
for key in self.custom_index:
self.custom_index[key]=0
def mirror_index(self,size, index): def mirror_index(self,size, index):
#size = len(self.coord_list_cycle) #size = len(self.coord_list_cycle)
turn = index // size turn = index // size
@ -62,11 +78,12 @@ class BaseReal:
idx = self.custom_audio_index[audiotype] idx = self.custom_audio_index[audiotype]
stream = self.custom_audio_cycle[audiotype][idx:idx+self.chunk] stream = self.custom_audio_cycle[audiotype][idx:idx+self.chunk]
self.custom_audio_index[audiotype] += self.chunk self.custom_audio_index[audiotype] += self.chunk
if self.custom_audio_index[audiotype]>=stream.shape[0]: if self.custom_audio_index[audiotype]>=self.custom_audio_cycle[audiotype].shape[0]:
self.curr_state = 1 #当前视频不循环播放,切换到静音状态 self.curr_state = 1 #当前视频不循环播放,切换到静音状态
return stream return stream
def set_curr_state(self,audiotype, reinit): def set_curr_state(self,audiotype, reinit):
print('set_curr_state:',audiotype)
self.curr_state = audiotype self.curr_state = audiotype
if reinit: if reinit:
self.custom_audio_index[audiotype] = 0 self.custom_audio_index[audiotype] = 0

View File

@ -166,12 +166,6 @@ class LipReal(BaseReal):
self.asr = LipASR(opt,self) self.asr = LipASR(opt,self)
self.asr.warm_up() self.asr.warm_up()
if opt.tts == "edgetts":
self.tts = EdgeTTS(opt,self)
elif opt.tts == "gpt-sovits":
self.tts = VoitsTTS(opt,self)
elif opt.tts == "xtts":
self.tts = XTTS(opt,self)
#self.__warm_up() #self.__warm_up()
self.render_event = mp.Event() self.render_event = mp.Event()
@ -257,6 +251,7 @@ class LipReal(BaseReal):
# self.asr.warm_up() # self.asr.warm_up()
self.tts.render(quit_event) self.tts.render(quit_event)
self.init_customindex()
process_thread = Thread(target=self.process_frames, args=(quit_event,loop,audio_track,video_track)) process_thread = Thread(target=self.process_frames, args=(quit_event,loop,audio_track,video_track))
process_thread.start() process_thread.start()

View File

@ -8,8 +8,8 @@ from baseasr import BaseASR
from musetalk.whisper.audio2feature import Audio2Feature from musetalk.whisper.audio2feature import Audio2Feature
class MuseASR(BaseASR): class MuseASR(BaseASR):
def __init__(self, opt, audio_processor:Audio2Feature): def __init__(self, opt, parent,audio_processor:Audio2Feature):
super().__init__(opt) super().__init__(opt,parent)
self.audio_processor = audio_processor self.audio_processor = audio_processor
def run_step(self): def run_step(self):

View File

@ -27,6 +27,7 @@ from ttsreal import EdgeTTS,VoitsTTS,XTTS
from museasr import MuseASR from museasr import MuseASR
import asyncio import asyncio
from av import AudioFrame, VideoFrame from av import AudioFrame, VideoFrame
from basereal import BaseReal
from tqdm import tqdm from tqdm import tqdm
def read_imgs(img_list): def read_imgs(img_list):
@ -125,9 +126,10 @@ def inference(render_event,batch_size,latents_out_path,audio_feat_queue,audio_ou
print('musereal inference processor stop') print('musereal inference processor stop')
@torch.no_grad() @torch.no_grad()
class MuseReal: class MuseReal(BaseReal):
def __init__(self, opt): def __init__(self, opt):
self.opt = opt # shared with the trainer's opt to support in-place modification of rendering parameters. super().__init__(opt)
#self.opt = opt # shared with the trainer's opt to support in-place modification of rendering parameters.
self.W = opt.W self.W = opt.W
self.H = opt.H self.H = opt.H
@ -156,14 +158,8 @@ class MuseReal:
self.__loadmodels() self.__loadmodels()
self.__loadavatar() self.__loadavatar()
self.asr = MuseASR(opt,self.audio_processor) self.asr = MuseASR(opt,self,self.audio_processor)
self.asr.warm_up() self.asr.warm_up()
if opt.tts == "edgetts":
self.tts = EdgeTTS(opt,self)
elif opt.tts == "gpt-sovits":
self.tts = VoitsTTS(opt,self)
elif opt.tts == "xtts":
self.tts = XTTS(opt,self)
#self.__warm_up() #self.__warm_up()
self.render_event = mp.Event() self.render_event = mp.Event()
@ -246,8 +242,16 @@ class MuseReal:
res_frame,idx,audio_frames = self.res_frame_queue.get(block=True, timeout=1) res_frame,idx,audio_frames = self.res_frame_queue.get(block=True, timeout=1)
except queue.Empty: except queue.Empty:
continue continue
if audio_frames[0][1]==1 and audio_frames[1][1]==1: #全为静音数据只需要取fullimg if audio_frames[0][1]!=0 and audio_frames[1][1]!=0: #全为静音数据只需要取fullimg
combine_frame = self.frame_list_cycle[idx] audiotype = audio_frames[0][1]
if self.custom_index.get(audiotype) is not None: #有自定义视频
mirindex = self.mirror_index(len(self.custom_img_cycle[audiotype]),self.custom_index[audiotype])
combine_frame = self.custom_img_cycle[audiotype][mirindex]
self.custom_index[audiotype] += 1
# if not self.custom_opt[audiotype].loop and self.custom_index[audiotype]>=len(self.custom_img_cycle[audiotype]):
# self.curr_state = 1 #当前视频不循环播放,切换到静音状态
else:
combine_frame = self.frame_list_cycle[idx]
else: else:
bbox = self.coord_list_cycle[idx] bbox = self.coord_list_cycle[idx]
ori_frame = copy.deepcopy(self.frame_list_cycle[idx]) ori_frame = copy.deepcopy(self.frame_list_cycle[idx])
@ -283,6 +287,7 @@ class MuseReal:
# self.asr.warm_up() # self.asr.warm_up()
self.tts.render(quit_event) self.tts.render(quit_event)
self.init_customindex()
process_thread = Thread(target=self.process_frames, args=(quit_event,loop,audio_track,video_track)) process_thread = Thread(target=self.process_frames, args=(quit_event,loop,audio_track,video_track))
process_thread.start() process_thread.start()

View File

@ -12,9 +12,9 @@ from threading import Thread, Event
from baseasr import BaseASR from baseasr import BaseASR
class ASR(BaseASR): class NerfASR(BaseASR):
def __init__(self, opt): def __init__(self, opt, parent):
super().__init__(opt) super().__init__(opt,parent)
self.device = 'cuda' if torch.cuda.is_available() else 'cpu' self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
if 'esperanto' in self.opt.asr_model: if 'esperanto' in self.opt.asr_model:
@ -66,8 +66,12 @@ class ASR(BaseASR):
type = 0 type = 0
#print(f'[INFO] get frame {frame.shape}') #print(f'[INFO] get frame {frame.shape}')
except queue.Empty: except queue.Empty:
frame = np.zeros(self.chunk, dtype=np.float32) if self.parent and self.parent.curr_state>1: #播放自定义音频
type = 1 frame = self.parent.get_audio_stream(self.parent.curr_state)
type = self.parent.curr_state
else:
frame = np.zeros(self.chunk, dtype=np.float32)
type = 1
return frame,type return frame,type

View File

@ -9,15 +9,17 @@ import time
import torch.nn.functional as F import torch.nn.functional as F
import cv2 import cv2
from asrreal import ASR from nerfasr import NerfASR
from ttsreal import EdgeTTS,VoitsTTS,XTTS from ttsreal import EdgeTTS,VoitsTTS,XTTS
import asyncio import asyncio
from av import AudioFrame, VideoFrame from av import AudioFrame, VideoFrame
from basereal import BaseReal
class NeRFReal: class NeRFReal(BaseReal):
def __init__(self, opt, trainer, data_loader, debug=True): def __init__(self, opt, trainer, data_loader, debug=True):
self.opt = opt # shared with the trainer's opt to support in-place modification of rendering parameters. super().__init__(opt)
#self.opt = opt # shared with the trainer's opt to support in-place modification of rendering parameters.
self.W = opt.W self.W = opt.W
self.H = opt.H self.H = opt.H
@ -55,17 +57,11 @@ class NeRFReal:
#self.ind_index = 0 #self.ind_index = 0
#self.ind_num = trainer.model.individual_codes.shape[0] #self.ind_num = trainer.model.individual_codes.shape[0]
self.customimg_index = 0 #self.customimg_index = 0
# build asr # build asr
self.asr = ASR(opt) self.asr = NerfASR(opt,self)
self.asr.warm_up() self.asr.warm_up()
if opt.tts == "edgetts":
self.tts = EdgeTTS(opt,self)
elif opt.tts == "gpt-sovits":
self.tts = VoitsTTS(opt,self)
elif opt.tts == "xtts":
self.tts = XTTS(opt,self)
''' '''
video_path = 'video_stream' video_path = 'video_stream'
@ -124,14 +120,14 @@ class NeRFReal:
self.asr.pause_talk() self.asr.pause_talk()
def mirror_index(self, index): # def mirror_index(self, index):
size = self.opt.customvideo_imgnum # size = self.opt.customvideo_imgnum
turn = index // size # turn = index // size
res = index % size # res = index % size
if turn % 2 == 0: # if turn % 2 == 0:
return res # return res
else: # else:
return size - res - 1 # return size - res - 1
def test_step(self,loop=None,audio_track=None,video_track=None): def test_step(self,loop=None,audio_track=None,video_track=None):
@ -148,39 +144,57 @@ class NeRFReal:
# use the live audio stream # use the live audio stream
data['auds'] = self.asr.get_next_feat() data['auds'] = self.asr.get_next_feat()
audiotype = 0 audiotype1 = 0
if self.opt.transport=='rtmp': audiotype2 = 0
for _ in range(2): #send audio
frame,type = self.asr.get_audio_out() for i in range(2):
audiotype += type frame,type = self.asr.get_audio_out()
#print(f'[INFO] get_audio_out shape ',frame.shape) if i==0:
audiotype1 = type
else:
audiotype2 = type
#print(f'[INFO] get_audio_out shape ',frame.shape)
if self.opt.transport=='rtmp':
self.streamer.stream_frame_audio(frame) self.streamer.stream_frame_audio(frame)
else: else: #webrtc
for _ in range(2):
frame,type = self.asr.get_audio_out()
audiotype += type
frame = (frame * 32767).astype(np.int16) frame = (frame * 32767).astype(np.int16)
new_frame = AudioFrame(format='s16', layout='mono', samples=frame.shape[0]) new_frame = AudioFrame(format='s16', layout='mono', samples=frame.shape[0])
new_frame.planes[0].update(frame.tobytes()) new_frame.planes[0].update(frame.tobytes())
new_frame.sample_rate=16000 new_frame.sample_rate=16000
# if audio_track._queue.qsize()>10: asyncio.run_coroutine_threadsafe(audio_track._queue.put(new_frame), loop)
# time.sleep(0.1)
asyncio.run_coroutine_threadsafe(audio_track._queue.put(new_frame), loop) # if self.opt.transport=='rtmp':
# for _ in range(2):
# frame,type = self.asr.get_audio_out()
# audiotype += type
# #print(f'[INFO] get_audio_out shape ',frame.shape)
# self.streamer.stream_frame_audio(frame)
# else: #webrtc
# for _ in range(2):
# frame,type = self.asr.get_audio_out()
# audiotype += type
# frame = (frame * 32767).astype(np.int16)
# new_frame = AudioFrame(format='s16', layout='mono', samples=frame.shape[0])
# new_frame.planes[0].update(frame.tobytes())
# new_frame.sample_rate=16000
# # if audio_track._queue.qsize()>10:
# # time.sleep(0.1)
# asyncio.run_coroutine_threadsafe(audio_track._queue.put(new_frame), loop)
#t = time.time() #t = time.time()
if self.opt.customvideo and audiotype!=0: if audiotype1!=0 and audiotype2!=0 and self.custom_index.get(audiotype1) is not None: #不为推理视频并且有自定义视频
self.loader = iter(self.data_loader) #init mirindex = self.mirror_index(len(self.custom_img_cycle[audiotype1]),self.custom_index[audiotype1])
imgindex = self.mirror_index(self.customimg_index) #imgindex = self.mirror_index(self.customimg_index)
#print('custom img index:',imgindex) #print('custom img index:',imgindex)
image = cv2.imread(os.path.join(self.opt.customvideo_img, str(int(imgindex))+'.png')) #image = cv2.imread(os.path.join(self.opt.customvideo_img, str(int(imgindex))+'.png'))
image = self.custom_img_cycle[audiotype1][mirindex]
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
self.custom_index[audiotype1] += 1
if self.opt.transport=='rtmp': if self.opt.transport=='rtmp':
self.streamer.stream_frame(image) self.streamer.stream_frame(image)
else: else:
new_frame = VideoFrame.from_ndarray(image, format="rgb24") new_frame = VideoFrame.from_ndarray(image, format="rgb24")
asyncio.run_coroutine_threadsafe(video_track._queue.put(new_frame), loop) asyncio.run_coroutine_threadsafe(video_track._queue.put(new_frame), loop)
self.customimg_index += 1 else: #推理视频+贴回
else:
self.customimg_index = 0
outputs = self.trainer.test_gui_with_data(data, self.W, self.H) outputs = self.trainer.test_gui_with_data(data, self.W, self.H)
#print('-------ernerf time: ',time.time()-t) #print('-------ernerf time: ',time.time()-t)
#print(f'[INFO] outputs shape ',outputs['image'].shape) #print(f'[INFO] outputs shape ',outputs['image'].shape)
@ -213,6 +227,8 @@ class NeRFReal:
#if self.opt.asr: #if self.opt.asr:
# self.asr.warm_up() # self.asr.warm_up()
self.init_customindex()
if self.opt.transport=='rtmp': if self.opt.transport=='rtmp':
from rtmp_streaming import StreamerConfig, Streamer from rtmp_streaming import StreamerConfig, Streamer
fps=25 fps=25

View File

@ -54,7 +54,20 @@
<script type="text/javascript" src="https://ajax.aspnetcdn.com/ajax/jquery/jquery-2.1.1.min.js"></script> <script type="text/javascript" src="https://ajax.aspnetcdn.com/ajax/jquery/jquery-2.1.1.min.js"></script>
</body> </body>
<script type="text/javascript" charset="utf-8"> <script type="text/javascript" charset="utf-8">
function custom() {
fetch('/set_audiotype', {
body: JSON.stringify({
audiotype: parseInt(document.getElementById('audiotype').value),
reinit: false,
sessionid:parseInt(document.getElementById('sessionid').value),
}),
headers: {
'Content-Type': 'application/json'
},
method: 'POST'
});
}
$(document).ready(function() { $(document).ready(function() {
// var host = window.location.hostname // var host = window.location.hostname
// var ws = new WebSocket("ws://"+host+":8000/humanecho"); // var ws = new WebSocket("ws://"+host+":8000/humanecho");
@ -94,20 +107,6 @@
//ws.send(message); //ws.send(message);
$('#message').val(''); $('#message').val('');
}); });
function custom() {
fetch('/set_audiotype', {
body: JSON.stringify({
audiotype: parseInt(document.getElementById('audiotype').value),
reinit: false,
sessionid:parseInt(document.getElementById('sessionid').value),
}),
headers: {
'Content-Type': 'application/json'
},
method: 'POST'
});
}
}); });
</script> </script>
</html> </html>