The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 18, 2017

Filed:

Apr. 10, 2015
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Barton W. Emanuel, Manassas, VA (US);

Ahmed M. A. Nassar, Katy, TX (US);

Sarbajit K. Rakshit, Kolkata, IN;

Craig M. Trim, Sylmar, CA (US);

Albert T. Wong, Hacienda Heights, CA (US);

Attorneys:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 17/20 (2006.01); G06F 17/28 (2006.01); G06F 17/27 (2006.01);
U.S. Cl.
CPC ...
G06F 17/2705 (2013.01); G06F 17/2735 (2013.01);
Abstract

To recombine incorrectly separated tokens in NLP, a determination is made whether a token from an ordered set of tokens is present in a dictionary related to a corpus from which the ordered set is extracted. When the token is not present in the dictionary, and when a compounding threshold has not been reached, the token is agglutinated with a next adjacent token in the ordered set to form the compound token. The compounding threshold limits a number of tokens that can be agglutinated to form a compound token. A determination is made whether the compound token is present in the dictionary. A weight is assigned to the compound token when the compound token is present in the dictionary and a confidence rating of the compound token is computed as a function of the weight. The compound token and the confidence rating are used in NLP of the corpus.


Find Patent Forward Citations

Loading…