Voice and matching lip language video filmed with 250 people by multi-devices simultaneously, aligned precisely by pulse signal, with high accuracy. It can be used in multi-modal learning algorithms research in speech and image fields.
For more details, please refer to the link: https://www.nexdata.ai/datasets/996?source=Github
Video: mp4 format, 1,280*720, Audio: wav format, 16HZ, 16bit mono
Using quiet sunny room to stimulate daytime outdoor driving scenes,Signal to noise ratio 25~20dB
divide to big scenes and sub scenes by different intense of sunlight
Short signals and spoken sentences
250 Chinese, balance for gender
Camera, HD microphone, Audio board
Recording videos of front face, single side face, looking up, looking down, side face looking down and side face looking up all 6 different angles, and proximal and distant audio at the same time
Mandarin
Lip Language recognization
Accuracy of sentence should not below 95%
Commercial License