The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11264009 B1

Date of Patent:

Mar. 01, 2022

Filed:

Sep. 13, 2019

System and method for a dialogue response generation system

Applicant:

Mitsubishi Electric Research Laboratories, Inc., Cambridge, MA (US);

Inventors:

Chiori Hori, Lexington, MA (US);

Anoop Cherian, Belmont, MA (US);

Tim Marks, Newton, MA (US);

Takaaki Hori, Lexington, MA (US);

Assignee:

Mitsubishi Electric Research Laboratories, Inc., Cambridge, MA (US);

Attorneys:

Gennadiy Vinokur

Hironori Tsukamoto

Primary Examiner:

Michael Colucci

Int. Cl.

CPC ...

G10L 15/06 (2013.01); G10L 15/02 (2006.01); G10L 15/22 (2006.01); G10L 19/00 (2013.01);

U.S. Cl.

CPC ...

G10L 15/063 (2013.01); G10L 15/02 (2013.01); G10L 15/22 (2013.01); G10L 19/00 (2013.01);

Abstract

A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder. In this case, the second multimodal encoder-decoder generates second output values based on the first audio-visual datasets without the first corresponding video description sentences.

Find Patent Forward Citations