The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jun. 17, 2025
Filed:
Nov. 16, 2021
Adobe Inc., San Jose, CA (US);
Jiuxiang Gu, Greenbelt City, MD (US);
Ani Nenkova Nenkova, Philadelphia, PA (US);
Nikolaos Barmpalios, Palo Alto, CA (US);
Vlad Ion Morariu, Potomac, MD (US);
Tong Sun, San Ramon, CA (US);
Rajiv Bhawanji Jain, Falls Church, VA (US);
Jason Wen Yong Kuen, Santa Clara, CA (US);
Handong Zhao, San Jose, CA (US);
Adobe Inc., San Jose, CA (US);
Abstract
The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.