The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jul. 18, 2017
Filed:
Apr. 10, 2015
International Business Machines Corporation, Armonk, NY (US);
Barton W. Emanuel, Manassas, VA (US);
Ahmed M. A. Nassar, Katy, TX (US);
Sarbajit K. Rakshit, Kolkata, IN;
Craig M. Trim, Sylmar, CA (US);
Albert T. Wong, Hacienda Heights, CA (US);
INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US);
Abstract
To recombine incorrectly separated tokens in NLP, a determination is made whether a token from an ordered set of tokens is present in a dictionary related to a corpus from which the ordered set is extracted. When the token is not present in the dictionary, and when a compounding threshold has not been reached, the token is agglutinated with a next adjacent token in the ordered set to form the compound token. The compounding threshold limits a number of tokens that can be agglutinated to form a compound token. A determination is made whether the compound token is present in the dictionary. A weight is assigned to the compound token when the compound token is present in the dictionary and a confidence rating of the compound token is computed as a function of the weight. The compound token and the confidence rating are used in NLP of the corpus.