The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G06V 30/414 (2022.01); G06F 18/214 (2023.01); G06F 18/25 (2023.01); G06F 40/30 (2020.01); G06N 3/02 (2006.01); G06N 20/00 (2019.01);

U.S. Cl.

CPC ...

G06V 30/414 (2022.01); G06F 18/214 (2023.01); G06F 18/253 (2023.01); G06F 40/30 (2020.01); G06N 3/02 (2013.01); G06N 20/00 (2019.01);

Abstract

The technology described includes methods for pretraining a document encoder model based on multimodal self cross-attention. One method includes receiving image data that encodes a set of pretraining documents. A set of sentences is extracted from the image data. A bounding box for each sentence is generated. For each sentence, a set of predicted features is generated by using an encoder machine-learning model. The encoder model performs cross-attention between a set of masked-textual features for the sentence and a set of masked-visual features for the sentence. The set of masked-textual features is based on a masking function and the sentence. The set of masked-visual features is based on the masking function and the corresponding bounding box. A document-encoder model is pretrained based on the set of predicted features for each sentence and pretraining tasks. The pretraining tasks includes masked sentence modeling, visual contrastive learning, or visual-language alignment.

Find Patent Forward Citations