The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Aug. 12, 2014
Filed:
May. 15, 2009
Justin Boyan, Providence, RI (US);
Glenn Mcdonald, Cambridge, MA (US);
Margaret Benthall, Cambridge, MA (US);
Ray Molnar, Duxbury, MA (US);
Justin Boyan, Providence, RI (US);
Glenn McDonald, Cambridge, MA (US);
Margaret Benthall, Cambridge, MA (US);
Ray Molnar, Duxbury, MA (US);
Google Inc., Mountain View, CA (US);
Abstract
Methods and systems to model and acquire data from a variety of data and information sources, to integrate the data into a structured database, and to manage the continuing reintegration of updated data from those sources over time. For any given domain, a variety of individual information and data sources that contain information relevant to the schema can be identified. Data elements associated with a schema may be identified in a training source, such as by user tagging. A formal grammar may be induced appropriate to the schema and layout of the training source. A Hidden Markov Model (HMM) corresponding to the grammar may learn where in the sources the elements can be found. The system can automatically mutate its schema into a grammar matching the structure of the source documents. By following an inverse transformation sequence, data that is parsed by the mutated grammar can be fit back into the original grammar structure, matching the original data schema defined through domain modeling. Features disclosed herein may be implemented with respect to web-scraping and data acquisition, and to represent data in support of data-editing and data-merging tasks. A schema may be defined with respect to a graph-based domain model.