The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jun. 17, 2025

Filed:

Sep. 15, 2022
Applicant:

Beijing Baidu Netcom Science Technology Co., Ltd., Beijing, CN;

Inventors:

Shuai Chen, Beijing, CN;

Qi Wang, Beijing, CN;

Hu Yang, Beijing, CN;

Feng He, Beijing, CN;

Zhifan Feng, Beijing, CN;

Chunguang Chai, Beijing, CN;

Yong Zhu, Beijing, CN;

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06V 10/82 (2022.01); G06N 3/08 (2023.01); G06V 10/80 (2022.01);
U.S. Cl.
CPC ...
G06V 10/82 (2022.01); G06N 3/08 (2013.01); G06V 10/80 (2022.01);
Abstract

Disclosed are a method for processing multimodal data using a neural network, a device, and a medium, and relates to the field of artificial intelligence and, in particular to multimodal data processing, video classification, and deep learning. The neural network includes: an input subnetwork configured to receive the multimodal data to output respective first features of a plurality of modalities; a plurality of cross-modal feature subnetworks, each of which is configured to receive respective first features of two corresponding modalities to output a cross-modal feature corresponding to the two modalities; a plurality of cross-modal fusion subnetworks, each of which is configured to receive at least one cross-modal feature corresponding to a corresponding target modality and other modalities to output a second feature of the target modality; and an output subnetwork configured to receive respective second features of the plurality of modalities to output a processing result of the multimodal data.


Find Patent Forward Citations

Loading…