The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jan. 31, 2023

Filed:

Aug. 25, 2021
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Takuya Goto, Tokyo, JP;

Tohru Hasegawa, Tokyo, JP;

Xiangning Liu, Tokyo, JP;

Asako Ono, Tokyo, JP;

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G06F 7/02 (2006.01); G06F 16/00 (2019.01); G06N 5/02 (2006.01); G06F 16/93 (2019.01); G06F 16/25 (2019.01); G06F 16/28 (2019.01);
U.S. Cl.
CPC ...
G06N 5/022 (2013.01); G06F 16/258 (2019.01); G06F 16/285 (2019.01); G06F 16/93 (2019.01);
Abstract

An approach is provided in which a method, system, and program create a plurality of page clusters in feature space from a plurality of feature vectors corresponding to a plurality of unstructured pages. The method, system, and program product assign one of a plurality of machine learning models to each one of the plurality of page clusters based on a relationship in the feature space between the plurality of page clusters and a plurality of training clusters corresponding to the plurality of machine learning models. The method, system, and program product identify one of the plurality of page clusters that corresponds to a selected one of the plurality of unstructured pages, and transform the selected unstructured page into a structured page using a selected one of the plurality of machine learning models assigned to the identified page cluster.


Find Patent Forward Citations

Loading…