The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Aug. 25, 2020
Filed:
Jan. 26, 2017
Evernote Corporation, Redwood City, CA (US);
Eugene Livshitz, San Mateo, CA (US);
Alexander Pashintsev, Cupertino, CA (US);
Boris Gorbatov, Sunnyvale, CA (US);
EVERNOTE CORPORATION, Redwood City, CA (US);
Abstract
Selecting data from a source text corpus for training a semantic data analysis system includes selecting an item of the text corpus, validating the item, extracting at least one section of the item, determining a length of each of the at least one section of the item, and subdividing each of the sections having a length greater than a predetermined amount into a plurality of fragments that are deemed to be similar. The predetermined amount may be approximately twice a size of a fragment. A fragment may have approximately 100 words or between 40 and 60 words. Fragments from different items may be deemed to be dissimilar. Sections having a length less than the predetermined amount may be ignored. Validating the item may include parsing editorial notes and other accompanying data. The source text corpus may be Wikipedia. The item may be an article.