The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Nov. 14, 2023

Filed:

Mar. 31, 2021
Applicants:

Nippon Telegraph and Telephone Corporation, Tokyo, JP;

Massachusetts Institute of Technology, Cambridge, MA (US);

Inventors:

Yasunori Ohishi, Musashino, JP;

Akisato Kimura, Musashino, JP;

Takahito Kawanishi, Musashino, JP;

Kunio Kashino, Musashino, JP;

James R. Glass, Winchester, MA (US);

David Harwath, Boston, MA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/02 (2006.01); G10L 15/22 (2006.01); G06N 3/08 (2023.01);
U.S. Cl.
CPC ...
G10L 15/02 (2013.01); G06N 3/08 (2013.01); G10L 15/22 (2013.01); G10L 2015/223 (2013.01);
Abstract

A learning device calculates an image feature using a model (image encoder) that receives an image and outputs the image feature obtained by mapping the image into a latent space. The learning device calculates an audio feature using a model (audio encoder) that receives a speech in a predetermined language and outputs the audio feature obtained by mapping the speech into the latent space, and that includes a neural network provided with a self-attention mechanism. The learning device updates parameters of the models used by an image feature calculation unit and an audio feature calculation unit such that the image feature of a first image is similar to the audio feature of a speech corresponding to the first image.


Find Patent Forward Citations

Loading…