The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 03, 2023

Filed:

Feb. 23, 2021
Applicant:

Beijing Baidu Netcom Science Technology Co., Ltd., Beijing, CN;

Inventors:

Yulin Li, Beijing, CN;

Xiameng Qin, Beijing, CN;

Ju Huang, Beijing, CN;

Qunyi Xie, Beijing, CN;

Junyu Han, Beijing, CN;

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 16/00 (2019.01); G06F 16/36 (2019.01); G06F 40/279 (2020.01); G06F 18/25 (2023.01); G06V 10/764 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01); G06V 10/44 (2022.01); G06V 10/426 (2022.01); G06N 3/02 (2006.01);
U.S. Cl.
CPC ...
G06F 16/367 (2019.01); G06F 18/253 (2023.01); G06F 40/279 (2020.01); G06V 10/426 (2022.01); G06V 10/454 (2022.01); G06V 10/764 (2022.01); G06V 10/811 (2022.01); G06V 10/82 (2022.01); G06N 3/02 (2013.01);
Abstract

A method for visual question answering, a computer device implementing the method and a medium for storing instructions on performing the method are provided. The method includes: acquiring an input image and an input question; constructing a visual graph based on the input image, wherein the visual graph comprises a first node feature and a first edge feature; constructing a question graph based on the input question, wherein the question graph comprises a second node feature and a second edge feature; performing a multimodal fusion on the visual graph and the question graph to obtain an updated visual graph and an updated question graph; determining a question feature based on the input question; determining a fusion feature based on the updated visual graph, the updated question graph and the question feature; and generating a predicted answer for the input image and the input question.


Find Patent Forward Citations

Loading…