The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Apr. 29, 2025

Filed:

Aug. 15, 2022
Applicant:

Adobe Inc., San Jose, CA (US);

Inventors:

Ajay Jain, San Jose, CA (US);

Sanjeev Tagra, Redmond, WA (US);

Sachin Soni, New Delhi, IN;

Ryan Rozich, Austin, TX (US);

Nikaash Puri, New Delhi, IN;

Jonathan Roeder, San Jose, CA (US);

Assignee:

adobe inc., San Jose, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/06 (2012.12); G06F 3/16 (2005.12); G06F 16/532 (2018.12); G06F 40/284 (2019.12); G06F 40/30 (2019.12); G06V 10/774 (2021.12); G10L 15/183 (2012.12); G10L 15/22 (2005.12);
U.S. Cl.
CPC ...
G10L 15/063 (2012.12); G06F 3/167 (2012.12); G06F 16/532 (2018.12); G06F 40/284 (2019.12); G06F 40/30 (2019.12); G06V 10/7747 (2021.12); G10L 15/183 (2012.12); G10L 15/22 (2012.12);
Abstract

An image search system uses a multi-modal model to determine relevance of images to a spoken query. The multi-modal model includes a spoken language model that extracts features from spoken query and a language processing model that extract features from an image. The multi-model model determines a relevance score for the image and the spoken query based on the extracted features. The multi-modal model is trained using a curriculum approach that includes training the spoken language model using audio data. Subsequently, a training dataset comprising a plurality of spoken queries and one or more images associated with each spoken query is used to jointly train the spoken language model and an image processing model to provide a trained multi-modal model.


Find Patent Forward Citations

Loading…