The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11922963 B1

Date of Patent:

Mar. 05, 2024

Filed:

May. 26, 2021

Systems and methods for human listening and live captioning

Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Xiaofei Wang, Bellevue, WA (US);

Sefik Emre Eskimez, Bellevue, WA (US);

Min Tang, Yarrow Point, WA (US);

Hemin Yang, Bellevue, WA (US);

Zirun Zhu, Bellevue, WA (US);

Zhuo Chen, Woodinville, WA (US);

Huaming Wang, Clyde Hill, WA (US);

Takuya Yoshioka, Bellevue, WA (US);

Assignee:

Microsoft Technology Licensing, LLC, Redmond, WA (US);

Attorney:

Workman Nydegger

Primary Examiner:

Bryan S Blankenagel

Int. Cl.

CPC ...

G10L 21/0208 (2013.01); G06N 3/084 (2023.01); G10L 25/30 (2013.01); G10L 25/51 (2013.01);

U.S. Cl.

CPC ...

G10L 21/0208 (2013.01); G06N 3/084 (2013.01); G10L 25/30 (2013.01); G10L 25/51 (2013.01);

Abstract

Systems and methods are provided for generating and operating a speech enhancement model optimized for generating noise-suppressed speech outputs for improved human listening and live captioning. A computing system obtains a speech enhancement model trained on a first training dataset to generate noise-suppressed speech outputs and an automatic speech recognition model trained on a second training dataset to generate transcription labels for spoken language utterances. A third training dataset comprising a set of spoken language utterances is applied to the speech enhancement model to obtain a first noise-suppressed speech output which is applied to the automatic speech recognition model to generate a noise-suppressed transcription output for the set of spoken language utterances. Speech enhancement model parameters are updated to optimize the speech enhancement model to generate optimized noise-suppressed speech outputs based on a comparison of the noise-suppressed transcription output and ground truth transcription labels.

Find Patent Forward Citations