The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jan. 21, 2014

Filed:

Oct. 14, 2010
Applicants:

Kang LI, Sammamish, WA (US);

Stephen Allen Kloder, Seattle, WA (US);

Ian George Johnson, Sammamish, WA (US);

Siarhei Alonichau, Bothell, WA (US);

Inventors:

Kang Li, Sammamish, WA (US);

Stephen Allen Kloder, Seattle, WA (US);

Ian George Johnson, Sammamish, WA (US);

Siarhei Alonichau, Bothell, WA (US);

Assignee:

Microsoft Corporation, Redmond, WA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 17/20 (2006.01); G06F 17/27 (2006.01); G10L 15/00 (2013.01);
U.S. Cl.
CPC ...
Abstract

Methods, systems, and media are provided for identifying languages in multilingual text. A document is decoded into a universal representative coding for easier tag manipulation, then broken into plain-text content sections. The sections are identified and assigned a weight, wherein more informative sections are given a higher weight and less informative sections are given a lesser weight. A language likelihood score is determined for each word, phrase, or character n-gram in a section. The language likelihood scores within a section are combined for each language. The combined section scores are then summed together to obtain a total document score for each language. This results in a document score for each language, which can be ranked to determine the primary language for the document.


Find Patent Forward Citations

Loading…