The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
May. 24, 2016
Filed:
Feb. 02, 2015
Linkedin Corporation, Mountain View, CA (US);
Bing Zhao, Sunnyvale, CA (US);
Ethan Zhang, San Jose, CA (US);
LinkedIn Corporation, Mountain View, CA (US);
Abstract
Techniques for training a tokenizer (or word segmenter) are provided. In one technique, a tokenizer tokenizes a token string to identify individual tokens or words. A language model is generated based on the identified tokens or words. A vocabulary about an entity, such as a person or company, is identified. The vocabulary may be online data that refers to the entity, such as a news article or a profile page of a member of a social network. Some of the tokens in the vocabulary may be weighted higher than others. The language model accepts the weighted vocabulary as input and generates pseudo sentences. Alternatively, regular expressions are used to generate the pseudo sentences. The pseudo sentences are used to train the tokenizer.