The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Feb. 07, 2017
Filed:
Aug. 27, 2014
Google Inc., Mountain View, CA (US);
Luis Garcia Pueyo, San Francisco, CA (US);
Vanja Josifovski, Los Gatos, CA (US);
Amitabh Saikia, Mountain View, CA (US);
Jie Yang, Santa Clara, CA (US);
Mike Bendersky, Sunnyvale, CA (US);
Srinidhi Viswanatha, Bangalore, IN;
Marc-Allen Cartright, Palo Alto, CA (US);
Google Inc., Mountain View, CA (US);
Abstract
Methods, apparatus, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of structured communications such as emails may be grouped into clusters based on one or more similarities between the structured communications. A set of structural paths may be identified from structured communications of a particular cluster. One or more structural paths of the set may be classified as transient wherein a count of occurrences of one or more associated segments of text across the particular cluster satisfies a criterion. One or more transient paths may be assigned a semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent structured communications, segments of text associated with transient (and in some cases, non-confidential) structural paths.