The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G10L 25/57 (2013.01); G10L 15/16 (2006.01); G10L 21/10 (2013.01); G10L 21/18 (2013.01); G06V 20/40 (2022.01); G06V 40/16 (2022.01); G10L 15/25 (2013.01); G06F 18/214 (2023.01); G10L 17/18 (2013.01);

U.S. Cl.

CPC ...

G10L 25/57 (2013.01); G06F 18/214 (2023.01); G06V 20/41 (2022.01); G06V 40/161 (2022.01); G10L 15/16 (2013.01); G10L 15/25 (2013.01); G10L 17/18 (2013.01); G10L 21/10 (2013.01); G10L 21/18 (2013.01);

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

Find Patent Forward Citations