The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 08, 2025

Filed:

Mar. 21, 2023
Applicant:

Google Llc, Mountain View, CA (US);

Inventors:

Weiran Wang, Palo Alto, CA (US);

Tongzhou Chen, Mountain View, CA (US);

Tara N. Sainath, Jersey City, NJ (US);

Ehsan Variani, Mountain View, CA (US);

Rohit Prakash Prabhavalkar, Palo Alto, CA (US);

Ronny Huang, Mountain View, CA (US);

Bhuvana Ramabhadran, Mt. Kisco, NY (US);

Neeraj Gaur, Mountain View, CA (US);

Sepand Mavandadi, Mountain View, CA (US);

Charles Caleb Peyser, New York, NY (US);

Trevor Strohman, Mountain View, CA (US);

Yangzhang He, Mountain View, CA (US);

David Rybach, Munich, DE;

Assignee:

Google LLC, Mountain View, CA (US);

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/00 (2013.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01); G10L 15/19 (2013.01); G10L 15/22 (2006.01);
U.S. Cl.
CPC ...
G10L 15/063 (2013.01); G10L 15/02 (2013.01); G10L 15/16 (2013.01); G10L 15/19 (2013.01); G10L 15/22 (2013.01);
Abstract

A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of the external language model.


Find Patent Forward Citations

Loading…