The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Mar. 04, 2025

Filed:

Feb. 17, 2022
Applicant:

Adobe Inc., San Jose, CA (US);

Inventors:

Cesa Salaam, Upper Marlboro, MD (US);

Seunghyun Yoon, San Jose, CA (US);

Trung Huu Bui, San Jose, CA (US);

Franck Dernoncourt, San Jose, CA (US);

Assignee:

Adobe Inc., San Jose, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/22 (2006.01); G06F 40/47 (2020.01); G06F 40/58 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01);
U.S. Cl.
CPC ...
G06F 40/58 (2020.01); G06F 40/47 (2020.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01);
Abstract

Techniques for training a language model for code switching content are disclosed. Such techniques include, in some embodiments, generating a dataset, which includes identifying one or more portions within textual content in a first language, the identified one or more portions each including one or more of offensive content or non-offensive content; translating the identified one or more salient portions to a second language; and reintegrating the translated one or more portions into the textual content to generate code-switched textual content. In some cases, the textual content in the first language includes offensive content and non-offensive content, the identified one or more portions include the offensive content, and the translated one or more portions include a translated version of the offensive content. In some embodiments, the code-switched textual content is at least part of a synthetic dataset usable to train a language model, such as a multilingual classification model.


Find Patent Forward Citations

Loading…