The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jun. 13, 2023
Filed:
Mar. 27, 2020
Daash Intelligence, Inc., Miami, FL (US);
Robert J. Munro, San Francisco, CA (US);
Rob Voigt, Palo Alto, CA (US);
Schuyler D. Erle, San Francisco, CA (US);
Brendan D. Callahan, Philadelphia, PA (US);
Gary C. King, Los Altos, CA (US);
Jessica D. Long, San Francisco, CA (US);
Jason Brenier, Oakland, CA (US);
Tripti Saxena, Cupertino, CA (US);
Stefan Krawczyk, Menlo Park, CA (US);
Daash Intelligence, Inc., Miami, FL (US);
Abstract
Systems, methods, and apparatuses are presented for a novel natural language tokenizer and tagger. In some embodiments, a method for tokenizing text for natural language processing comprises: generating from a pool of documents, a set of statistical models comprising one or more entries each indicating a likelihood of appearance of a character/letter sequence in the pool of documents; receiving a set of rules comprising rules that identify character/letter sequences as valid tokens; transforming one or more entries in the statistical models into new rules that are added to the set of rules when the entries indicate a high likelihood; receiving a document to be processed; dividing the document to be processed into tokens based on the set of statistical models and the set of rules, wherein the statistical models are applied where the rules fail to unambiguously tokenize the document; and outputting the divided tokens for natural language processing.