The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
May. 30, 2023

Filed:

May. 18, 2021
Applicant:

Toyota Research Institute, Inc., Los Altos, CA (US);

Inventors:

Zhijian Liu, Cambridge, MA (US);

Simon A. I. Stent, Cambridge, MA (US);

John H. Gideon, Howell, MI (US);

Jie Li, Los Altos, CA (US);

Assignee:

Toyota Research Institute, Inc., Los Altos, CA (US);

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G06K 9/62 (2022.01); G06K 9/32 (2006.01); G06K 9/34 (2006.01); G06N 3/04 (2023.01); G06F 18/214 (2023.01); G06V 20/62 (2022.01); G06V 30/148 (2022.01); G06F 18/213 (2023.01); G06F 18/21 (2023.01); G06N 3/045 (2023.01);
U.S. Cl.
CPC ...
G06F 18/2148 (2023.01); G06F 18/213 (2023.01); G06F 18/2185 (2023.01); G06N 3/045 (2023.01); G06V 20/635 (2022.01); G06V 30/153 (2022.01);
Abstract

Systems and methods for training a model are described herein. In one example, a system for training the model includes a processor and a memory in communication with the processor having a training module. The training module has instructions that cause the processor to determine a contrastive loss using a self-supervised contrastive loss function, adjust, based on the contrastive loss, model weights a visual backbone that generated feature maps and/or a textual backbone that generated feature vectors. The training module also has instructions that cause the processor to determine a localized loss using a supervised loss function that compares an image-caption attention map with visual identifiers and adjust, based on the localized loss, the model weights the visual backbone and/or the textual backbone.


Find Patent Forward Citations

Loading…