The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Sep. 28, 2021

Filed:

Oct. 03, 2017
Applicant:

Mitsubishi Electric Research Laboratories, Inc., Cambridge, MA (US);

Inventors:

Shinji Watanabe, Arlington, MA (US);

Tsubasa Ochiai, Kyoto-fu, JP;

Takaaki Hori, Lexington, MA (US);

John R Hershey, Winchester, MA (US);

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/28 (2013.01); G10L 21/0216 (2013.01); G10L 15/16 (2006.01); G10L 15/20 (2006.01); G10L 25/30 (2013.01);
U.S. Cl.
CPC ...
G10L 15/28 (2013.01); G10L 15/16 (2013.01); G10L 15/20 (2013.01); G10L 21/0216 (2013.01); G10L 25/30 (2013.01); G10L 2021/02166 (2013.01);
Abstract

A speech recognition system includes a plurality of microphones to receive acoustic signals including speech signals, an input interface to generate multichannel inputs from the acoustic signals, one or more storages to store a multichannel speech recognition network, wherein the multichannel speech recognition network comprises mask estimation networks to generate time-frequency masks from the multichannel inputs, a beamformer network trained to select a reference channel input from the multichannel inputs using the time-frequency masks and generate an enhanced speech dataset based on the reference channel input and an encoder-decoder network trained to transform the enhanced speech dataset into a text. The system further includes one or more processors, using the multichannel speech recognition network in association with the one or more storages, to generate the text from the multichannel inputs, and an output interface to render the text.


Find Patent Forward Citations

Loading…