The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 19, 2013

Filed:

Apr. 19, 2007
Applicants:

Xin Liu, San Jose, CA (US);

Stewart Yang, Sunnyvale, CA (US);

Inventors:

Xin Liu, San Jose, CA (US);

Stewart Yang, Sunnyvale, CA (US);

Assignee:

Google Inc., Mountain View, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/28 (2006.01);
U.S. Cl.
CPC ...
Abstract

Methods, systems and apparatus, including computer program products, for identifying properties of an electronic document. In one aspect, a sequence of bytes representing text in a document is received. A plurality of byte-n-grams are identified from the bytes. For multiple encodings, a respective likelihood of each byte-n-gram occurring in each of the respective multiple encodings is identified. A respective encoding score for each of the multiple encodings is determined. A most likely encoding of the document is identified based on a highest encoding score among the encoding scores. In another aspect, a sequence of characters, having an encoding, are identified in a document. The sequence is segmented into features, each corresponding to two or more characters. A respective score for each of multiple languages is determined based on the features and a respective language model. A language of the document is identified based on the scores.


Find Patent Forward Citations

Loading…