The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11574628 B1

Date of Patent:

Feb. 07, 2023

Filed:

Mar. 28, 2019

Deep multi-channel acoustic modeling using multiple microphone array geometries

Applicant:

Amazon Technologies, Inc., Seattle, WA (US);

Inventors:

Kenichi Kumatani, San Jose, CA (US);

Minhua Wu, San Jose, CA (US);

Shiva Sundaram, Fremont, CA (US);

Nikko Strom, Kirkland, WA (US);

Bjorn Hoffmeister, Seattle, WA (US);

Assignee:

Amazon Technologies, Inc., Seattle, WA (US);

Attorney:

Pierce Atwood LLP

Primary Examiner:

Paras D Shah

Int. Cl.

CPC ...

G10L 15/16 (2006.01); G10L 25/30 (2013.01); G10L 15/02 (2006.01); G06N 3/08 (2023.01);

U.S. Cl.

CPC ...

G10L 15/16 (2013.01); G06N 3/08 (2013.01); G10L 15/02 (2013.01); G10L 25/30 (2013.01);

Abstract

Techniques for speech processing using a deep neural network (DNN) based acoustic model front-end are described. A new modeling approach directly models multi-channel audio data received from a microphone array using a first model (e.g., multi-geometry/multi-channel DNN) that is trained using a plurality of microphone array geometries. Thus, the first model may receive a variable number of microphone channels, generate multiple outputs using multiple microphone array geometries, and select the best output as a first feature vector that may be used similarly to beamformed features generated by an acoustic beamformer. A second model (e.g., feature extraction DNN) processes the first feature vector and transforms it to a second feature vector having a lower dimensional representation. A third model (e.g., classification DNN) processes the second feature vector to perform acoustic unit classification and generate text data. The DNN front-end enables improved performance despite a reduction in microphones.

Find Patent Forward Citations