The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 22, 2025

Filed:

Jun. 29, 2022
Applicant:

Amazon Technologies, Inc., Seattle, WA (US);

Inventors:

Wentao Zhu, Redmond, WA (US);

Mohamed Kamal Omar, Seattle, WA (US);

Han-Kai Hsu, Seattle, WA (US);

Xiaohang Sun, Bellevue, WA (US);

Ashutosh Sanan, Seattle, WA (US);

Assignee:

Amazon Technologies, Inc., Seattle, WA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/75 (2019.01); G06F 16/71 (2019.01); G06F 16/783 (2019.01); G06N 3/08 (2023.01);
U.S. Cl.
CPC ...
G06F 16/75 (2019.01); G06F 16/71 (2019.01); G06F 16/7834 (2019.01); G06N 3/08 (2013.01);
Abstract

Systems, methods, and computer-readable media are disclosed for systems and methods multimodal indexing of video using machine learning. An example method may include deceiving, by a video encoder of an audio-video transformer neural network comprising one or more computer processors coupled to memory, a first frame and a second frame associated with a first segment of a video. The example method may also include receiving, by an audio encoder of the audio-video transformer neural network, an audio spectrogram comprising first audio data associated with the first segment of the video. generating, by the video encoder, a first video embedding. The example method may also include generating, by the audio encoder, a first audio embedding. The example method may also include determining a fusion of the first video embedding and the first audio embedding using a multimodal bottleneck token. The example method may also include determining an output including the first video embedding and the first audio embedding. The example method may also include determining a classification of the first portion of the video based on the output.


Find Patent Forward Citations

Loading…