The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Mar. 08, 2016
Filed:
Oct. 30, 2015
Sas Institute Inc., Cary, NC (US);
Ning Jin, Morrisville, NC (US);
James Allen Cox, Cary, NC (US);
SAS Institute Inc., Cary, NC (US);
Abstract
Electronic communications can be normalized using feature sets. For example, an electronic representation of a noncanonical communication can be received, and multiple candidate canonical versions of the noncanonical communication can be determined. A first feature set representative of the noncanonical communication can be determined by splitting the noncanonical communication into at least one n-gram and at least one k-skip-n-gram. Multiple comparison feature sets can be determined by splitting multiple terms in training data into respective comparison feature sets. Multiple Jaccard index values can be determined using the first feature set and the multiple comparison feature sets. A subset of the multiple terms in the training data in which an associated Jaccard index value exceeds a threshold can be selected. The subset of the multiple terms can be included in the multiple candidate canonical versions. A normalized version of the noncanonical communication can be selected from the multiple candidate canonical versions.