The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11580965 B1

Date of Patent:

Feb. 14, 2023

Filed:

Jul. 24, 2020

Multimodal based punctuation and/or casing prediction

Applicant:

Amazon Technologies, Inc., Seattle, WA (US);

Inventors:

Monica Lakshmi Sunkara, Seatle, WA (US);

Srikanth Ronanki, Bellevue, WA (US);

Dhanush Bekal Kannangola, Seattle, WA (US);

Sravan Babu Bodapati, Redmond, WA (US);

Katrin Kirchhoff, Seattle, WA (US);

Assignee:

Amazon Technologies, Inc., Seattle, WA (US);

Attorney:

Nicholson De Vos Webster & Elliott, LLP

Primary Examiner:

Neeraj Sharma

Int. Cl.

CPC ...

G10L 15/19 (2013.01); G06N 3/049 (2023.01); G10L 15/26 (2006.01); G10L 15/06 (2013.01);

U.S. Cl.

CPC ...

G10L 15/19 (2013.01); G06N 3/049 (2013.01); G10L 15/26 (2013.01); G10L 2015/0633 (2013.01);

Abstract

Techniques for predicting punctuation and casing using multimodal fusion are described. An exemplary method includes processing generated text by: tokenizing the generated text into sub-words, and generating a sequence of lexical features for the sub-words using a pre-trained lexical encoder; processing audio of the audio by: generating a sequence of frame level acoustic embeddings using a pre-trained acoustic encoder on the audio, and generating task specific embeddings from the frame level acoustic embeddings; performing multimodal fusion of the sub-word level acoustic embeddings and the sequence of lexical features by: aligning the task specific embeddings to the sequence of lexical features, and combining the sequence of lexical features and aligned acoustic sequence; predicting punctuation and casing from the combined sequence of lexical features and aligned acoustic sequence; concatenating the sub-words of the text, and applying the predicted punctuation and casing; and outputting text having the predicted punctuation and casing.

Find Patent Forward Citations