The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Feb. 06, 2024
Filed:
Sep. 22, 2022
Google Llc, Mountain View, CA (US);
Inbar Mosseri, Raanana, IL;
Michael Rubinstein, Natick, MA (US);
Ariel Ephrat, Efrat, IL;
William Freeman, Acton, MA (US);
Oran Lang, Givatayim, IL;
Kevin William Wilson, Cambridge, MA (US);
Tali Dekel, Arlington, MA (US);
Avinatan Hassidim, Petah Tikva, IL;
Google LLC, Mountain View, CA (US);
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.