The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G10L 15/26 (2006.01); G06T 1/00 (2006.01); G10L 17/06 (2013.01); G10L 21/0364 (2013.01); G06F 3/01 (2006.01); G10L 15/25 (2013.01); H04R 1/40 (2006.01); G10L 15/02 (2006.01); G10L 15/05 (2013.01); G10L 25/78 (2013.01); G10L 15/22 (2006.01); H04R 3/00 (2006.01); G06V 40/16 (2022.01);

U.S. Cl.

CPC ...

G10L 15/25 (2013.01); G06V 40/171 (2022.01); G10L 15/02 (2013.01); G10L 15/05 (2013.01); G10L 15/22 (2013.01); G10L 25/78 (2013.01); H04R 1/406 (2013.01); H04R 3/005 (2013.01);

Abstract

The disclosed embodiments disclose methods, apparatuses, systems, devices and computer-readable storage media for processing speech signals. The method comprises: acquiring a real-time image by using an image capturing device, performing facial recognition by using the real-time image, and detecting a period during which a target user makes speech sounds based on a facial recognition result; locating a sound source in an audio signal received by a microphone array, and determining the orientation information of a sound source in the audio signal; and based on the period during which the target user in the real-time image makes the speech sounds and the orientation information of the sound source, performing a speech sound start and end point analysis to determine start and end time points of the speech sounds in the audio signal. The method for processing speech signals according to one embodiment can perform voice activity detection to the speech signal in noisy environments containing multiple sources of interference, thereby improving the anti-interference capability of the system.

Find Patent Forward Citations