The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Aug. 29, 2023

Filed:

Mar. 23, 2021
Applicant:

Google Llc, Mountain View, CA (US);

Inventors:

Anshuman Tripathi, Mountain View, CA (US);

Hasim Sak, Santa Clara, CA (US);

Han Lu, Santa Clara, CA (US);

Qian Zhang, Mountain View, CA (US);

Jaeyoung Kim, Mountain View, CA (US);

Assignee:

Google LLC, Mountain View, CA (US);

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/16 (2006.01); G06N 3/04 (2023.01); G06N 3/088 (2023.01); G10L 15/06 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01);
U.S. Cl.
CPC ...
G10L 15/16 (2013.01); G06N 3/04 (2013.01); G06N 3/088 (2013.01); G10L 15/063 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01);
Abstract

A transformer-transducer model for unifying streaming and non-streaming speech recognition includes an audio encoder, a label encoder, and a joint network. The audio encoder receives a sequence of acoustic frames, and generates, at each of a plurality of time steps, a higher order feature representation for a corresponding acoustic frame. The label encoder receives a sequence of non-blank symbols output by a final softmax layer, and generates, at each of the plurality of time steps, a dense representation. The joint network receives the higher order feature representation and the dense representation at each of the plurality of time steps, and generates a probability distribution over possible speech recognition hypothesis. The audio encoder of the model further includes a neural network having an initial stack of transformer layers trained with zero look ahead audio context, and a final stack of transformer layers trained with a variable look ahead audio context.


Find Patent Forward Citations

Loading…