The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
May. 19, 2020
Filed:
Nov. 23, 2016
Google Inc., Mountain View, CA (US);
Ying Sheng, Sunnyvale, CA (US);
Yifeng Lu, Mountain View, CA (US);
Jing Xie, San Jose, CA (US);
Jie Yang, Sunnyvale, CA (US);
Luis Garcia Pueyo, Mountain View, CA (US);
Jinan Lou, Cupertino, CA (US);
James Wendt, Los Angeles, CA (US);
GOOGLE LLC, Mountain View, CA (US);
Abstract
Techniques are described herein for automatically generating data extraction templates for structured documents (e.g., B2C emails, invoices, bills, invitations, etc.), and for assigning classifications to those data extraction templates to streamline data extraction from subsequent structured documents. In various implementations, a data extraction template generated from a cluster of structured documents that share fixed content may be identified. Features of the cluster of structured documents may be applied as input to extraction machine learning model(s) trained to provide location(s) of transient field(s) in structured documents, to determine location(s) of transient field(s) in the cluster of structured documents. An association between the data extraction template and the determined transient field location(s) may be stored. Based on the association, data point(s) may be extracted from a given structured document of a user that shares fixed content with the cluster of structured documents. The extracted data point(s) may be surfaced to the user.