The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 12217761 B1

Date of Patent:

Feb. 04, 2025

Filed:

Oct. 31, 2021

Target speaker mode

Applicant:

Zoom Video Communications, Inc., San Jose, CA (US);

Inventors:

Yuhui Chen, San Jose, CA (US);

Qiyong Liu, Singapore, SG;

Zhengwei Wei, Jiangxi, CN;

Yangbin Zeng, Zhejiang, CN;

Assignee:

Zoom Video Communications, Inc., San Jose, CA (US);

Attorney:

Kilpatrick Townsend & Stockton LLP

Primary Examiner:

Vijay B Chawan

Assistant Examiner:

Nadira undefined Sultana

Int. Cl.

CPC ...

G10L 17/08 (2013.01); G10L 17/04 (2013.01); G10L 25/21 (2013.01); G10L 25/78 (2013.01);

U.S. Cl.

CPC ...

G10L 17/08 (2013.01); G10L 17/04 (2013.01); G10L 25/21 (2013.01); G10L 25/78 (2013.01);

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media relate to a method for target speaker extraction. A target speaker extraction system receives an audio frame of an audio signal. A multi-speaker detection model analyzes the audio frame to determine whether the audio frame includes only a single-speaker or multiple speakers. When the audio frame includes only a single-speaker, the system inputs the audio frame to a target speaker VAD model to suppress speech in the audio frame from a non-target speaker based on comparing the audio frame to a voiceprint of a target speaker. When the audio frame includes multiple speakers, the system inputs the audio frame to a speech separation model to separate the voice of the target speaker from a voice mixture in the audio frame.

Find Patent Forward Citations