The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
May. 13, 2025
Filed:
Mar. 11, 2020
International Business Machines Corporation, Armonk, NY (US);
Peter Zhong, Vermont, AU;
Antonio Jose Jimeno Yepes, Melbourne, AU;
Elaheh ShafieiBavani, Melbourne, AU;
International Business Machines Corporation, Armonk, NY (US);
Abstract
Embodiments of the invention describe a computer-implemented method of analyzing an electronic version of a document. The computer-implemented method can include an architecture of machine learning sub-models that performs the global task of translating unstructured and semi-structured inputs into numerical representations that can be recognized and manipulated by a content-analysis (CA) sub-model without relying on brute force analysis. Embodiments of the invention achieve these results by separating the global task into auxiliary tasks and assigning each sub-model to at least one of the auxiliary tasks. The auxiliary tasks can include parsing the unstructured or semi-structured inputs into format types (e.g., lists, tables, figures, text, etc. of a PDF document), extracting features of the parsed document, and performing a computer-based CA on the extracted features. The sub-models are trained in stages and in groups, wherein both the stages and the groupings are based on the complexity of the sub-model's assigned task.