The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 09, 2024

Filed:

Jun. 15, 2022
Applicant:

Wipro Limited, Bangalore, IN;

Inventors:
Assignee:

Wipro Limited, Bangalore, IN;

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/958 (2019.01); G06F 16/84 (2019.01); G06F 16/93 (2019.01); G06F 40/143 (2020.01); G06F 40/221 (2020.01); G06N 20/00 (2019.01); G06V 30/18 (2022.01); G06V 30/41 (2022.01); G06V 30/416 (2022.01);
U.S. Cl.
CPC ...
G06F 16/986 (2019.01); G06F 16/84 (2019.01); G06F 16/93 (2019.01); G06F 40/143 (2020.01); G06F 40/221 (2020.01); G06N 20/00 (2019.01); G06V 30/18 (2022.01); G06V 30/41 (2022.01); G06V 30/416 (2022.01);
Abstract

Disclosed herein is method and a system for extracting information from an input document comprising multi-format information. In an embodiment, a Hypertext Markup Language (HTML) document corresponding to the input document is created by analyzing the input document comprising documents of multiple data formats. Further, the HTML document is realigned based on a number of columns in each page of the HTML document. Furthermore, a document Identifier (ID) associated with each of the documents is determined in realigned HTML document by classifying information in each of the document pages using a pretrained Machine Learning (ML) model. Subsequently, a hierarchy configuration file, corresponding to the realigned HTML document, is generated based on the document ID. Finally, information from the hierarchy configuration file associated with each of the document ID is extracted by orchestrating one or more data extractors for extracting data attributes from the hierarchy configuration file.


Find Patent Forward Citations

Loading…