The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Mar. 03, 2020

Filed:

May. 10, 2018
Applicant:

Google Llc, Mountain View, CA (US);

Inventors:

Jason Riesa, San Francisco, CA (US);

Daniel Gillick, Oakland, CA (US);

Yuan Zhang, Santa Clara, CA (US);

Anton Bakalov, Jersey City, NJ (US);

Jason Baldridge, Mountain View, CA (US);

David Weiss, Mountain View, CA (US);

Assignee:

Google LLC, Mountain View, CA (US);

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/27 (2006.01); G06N 7/00 (2006.01);
U.S. Cl.
CPC ...
G06F 17/277 (2013.01); G06F 17/275 (2013.01); G06F 17/2785 (2013.01); G06N 7/005 (2013.01);
Abstract

A method for identifying codemixed text includes receiving codemixed text and segmenting the codemixed text into a plurality of tokens. Each token includes at least one character and is delineated from any adjacent tokens by a space. For each token of the codemixed text, the method also includes extracting features from the token and predicting a probability distribution over possible languages for the token using a language identifier model configured to receive the extracted features from the token as feature inputs. The method also includes assigning a language to each token of the codemixed text by executing a greedy search on the probability distribution over the possible languages predicted for each respective token.


Find Patent Forward Citations

Loading…