The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
May. 13, 2025

Filed:

Mar. 11, 2020
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Peter Zhong, Vermont, AU;

Antonio Jose Jimeno Yepes, Melbourne, AU;

Elaheh ShafieiBavani, Melbourne, AU;

Attorneys:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 16/93 (2019.01); G06F 40/205 (2020.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G06N 3/084 (2023.01); G06N 5/025 (2023.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06N 20/20 (2019.01); G06V 10/82 (2022.01); G06V 30/10 (2022.01); G06V 30/19 (2022.01); G06V 30/413 (2022.01);
U.S. Cl.
CPC ...
G06F 16/93 (2019.01); G06F 40/205 (2020.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 3/084 (2013.01); G06N 5/025 (2013.01); G06N 20/20 (2019.01); G06V 10/82 (2022.01); G06V 30/19173 (2022.01); G06V 30/413 (2022.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01); G06V 30/10 (2022.01);
Abstract

Embodiments of the invention describe a computer-implemented method of analyzing an electronic version of a document. The computer-implemented method can include an architecture of machine learning sub-models that performs the global task of translating unstructured and semi-structured inputs into numerical representations that can be recognized and manipulated by a content-analysis (CA) sub-model without relying on brute force analysis. Embodiments of the invention achieve these results by separating the global task into auxiliary tasks and assigning each sub-model to at least one of the auxiliary tasks. The auxiliary tasks can include parsing the unstructured or semi-structured inputs into format types (e.g., lists, tables, figures, text, etc. of a PDF document), extracting features of the parsed document, and performing a computer-based CA on the extracted features. The sub-models are trained in stages and in groups, wherein both the stages and the groupings are based on the complexity of the sub-model's assigned task.


Find Patent Forward Citations

Loading…