The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Sep. 10, 2024

Filed:

Nov. 24, 2021
Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Gaurav Mittal, Redmond, WA (US);

Ye Yu, Redmond, WA (US);

Mei Chen, Bellevue, WA (US);

Jay Sanjay Patravali, Corvallis, OR (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06K 9/00 (2022.01); G06F 16/73 (2019.01); G06F 16/75 (2019.01); G06N 20/00 (2019.01); G06V 10/764 (2022.01); G06V 10/774 (2022.01);
U.S. Cl.
CPC ...
G06V 10/7753 (2022.01); G06F 16/73 (2019.01); G06F 16/75 (2019.01); G06N 20/00 (2019.01); G06V 10/764 (2022.01); G06V 10/7747 (2022.01);
Abstract

The disclosure herein describes preparing and using a cross-attention model for action recognition using pre-trained encoders and novel class fine-tuning. Training video data is transformed into augmented training video segments, which are used to train an appearance encoder and an action encoder. The appearance encoder is trained to encode video segments based on spatial semantics and the action encoder is trained to encode video segments based on spatio-temporal semantics. A set of hard-mined training episodes are generated using the trained encoders. The cross-attention module is then trained for action-appearance aligned classification using the hard-mined training episodes. Then, support video segments are obtained, wherein each support video segment is associated with video classes. The cross-attention module is fine-tuned using the obtained support video segments and the associated video classes. A query video segment is obtained and classified as a video class using the fine-tuned cross-attention module.


Find Patent Forward Citations

Loading…