For the Inventor, By the Inventor

The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 12183322 B1

Date of Patent:

Dec. 31, 2024

Filed:

Sep. 22, 2022

Language agnostic multilingual end-to-end streaming on-device asr system

Applicant:

Google Llc, Mountain View, CA (US);

Inventors:

Bo Li, Fremont, CA (US);

Tara N. Sainath, Jersey City, NJ (US);

Ruoming Pang, New York, NY (US);

Shuo-yiin Chang, Sunnyvale, CA (US);

Qiumin Xu, Mountain View, CA (US);

Trevor Strohman, Mountain View, CA (US);

Vince Chen, Mountain View, CA (US);

Qiao Liang, Mountain View, CA (US);

Heguang Liu, Mountain View, CA (US);

Yanzhang He, Palo Alto, CA (US);

Parisa Haghani, Mountain View, CA (US);

Sameer Bidichandani, Mountain View, CA (US);

Assignee:

Google LLC, Mountain View, CA (US);

Attorneys:

Honigman LLP

Brett A. Krueger

Grant Griffith

Primary Examiner:

Daniel Abebe

Int. Cl.

CPC ...

G10L 15/00 (2013.01); G10L 15/06 (2013.01); G10L 15/22 (2006.01); G10L 15/30 (2013.01);

U.S. Cl.

CPC ...

G10L 15/005 (2013.01); G10L 15/063 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 2015/226 (2013.01);

Abstract

A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.

Find Patent Forward Citations

Loading…