The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G06V 20/40 (2022.01); G06F 40/279 (2020.01); G06F 40/284 (2020.01); G06V 10/26 (2022.01); G06V 10/74 (2022.01); G06V 10/774 (2022.01); G06V 10/776 (2022.01); G06V 10/80 (2022.01);

U.S. Cl.

CPC ...

G06V 20/41 (2022.01); G06F 40/279 (2020.01); G06F 40/284 (2020.01); G06V 10/26 (2022.01); G06V 10/761 (2022.01); G06V 10/774 (2022.01); G06V 10/776 (2022.01); G06V 10/806 (2022.01); G06V 20/46 (2022.01); G06V 20/47 (2022.01);

Abstract

Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.

Find Patent Forward Citations