The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
May. 23, 2023

Filed:

Apr. 03, 2020
Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Rui Zhao, Bellevue, WA (US);

Jinyu Li, Redmond, WA (US);

Liang Lu, Redmond, WA (US);

Yifan Gong, Sammamish, WA (US);

Hu Hu, Atlanta, GA (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/22 (2006.01); G10L 15/26 (2006.01); G10L 15/16 (2006.01); G10L 15/06 (2013.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01);
U.S. Cl.
CPC ...
G10L 15/063 (2013.01); G06N 3/0445 (2013.01); G06N 3/08 (2013.01);
Abstract

Techniques performed by a data processing system for training a Recurrent Neural Network Transducer (RNN-T) herein include encoder pretraining by training a neural network-based token classification model using first token-aligned training data representing a plurality of utterances, where each utterance is associated with a plurality of frames of audio data and tokens representing each utterance are aligned with frame boundaries of the plurality of audio frames; obtaining first cross-entropy (CE) criterion from the token classification model, wherein the CE criterion represent a divergence between expected outputs and reference outputs of the model; pretraining an encoder of an RNN-T based on the first CE criterion; and training the RNN-T with second training data after pretraining the encoder of the RNN-T. These techniques also include whole-network pre-training of the RNN-T. A RNN-T pretrained using these techniques may be used to process audio data that includes spoken content to obtain a textual representation.


Find Patent Forward Citations

Loading…