The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G06V 10/80 (2022.01); G06F 40/40 (2020.01); G06V 10/764 (2022.01); G06V 10/77 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 10/86 (2022.01);

U.S. Cl.

CPC ...

G06V 10/811 (2022.01); G06F 40/40 (2020.01); G06V 10/764 (2022.01); G06V 10/7715 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 10/86 (2022.01);

Abstract

Various embodiments classify one or more portions of an image based on deriving an 'intrinsic' modality. Such intrinsic modality acts as a substitute to a “text” modality in a multi-modal network. A text modality in image processing is typically a natural language text that describes one or more portions of an image. However, explicit natural language text may not be available across one or more domains for training a multi-modal network. Accordingly, various embodiments described herein generate an intrinsic modality, which is also a description of one or more portions of an image, except that such description is not an explicit natural language description, but rather a machine learning model representation. Some embodiments additionally leverage a visual modality obtained from a vision-only model or branch, which may learn domain characteristics that are not present in the multi-modal network. Some embodiments additionally fuse or integrate the intrinsic modality with the visual modality for better generalization.

Find Patent Forward Citations