For the Inventor, By the Inventor

The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 12026468 B1

Date of Patent:

Jul. 02, 2024

Filed:

Oct. 28, 2021

Out-of-domain data augmentation for natural language processing

Applicant:

Oracle International Corporation, Redwood Shores, CA (US);

Inventors:

Elias Luqman Jalaluddin, Seattle, WA (US);

Vishal Vishnoi, Redwood City, CA (US);

Thanh Long Duong, Seabrook, AU;

Mark Edward Johnson, Castle Grove, AU;

Poorya Zaremoodi, Melbourne, AU;

Gautam Singaraju, Dublin, CA (US);

Ying Xu, Albion, AU;

Vladislav Blinov, Melbourne, AU;

Yu-Heng Hong, Carlton, AU;

Assignee:

Oracle International Corporation, Redwood Shores, CA (US);

Attorney:

Kilpatrick Townsend & Stockton LLP

Primary Examiner:

Pierre Louis Desir

Assistant Examiner:

Daniel W Chung

Int. Cl.

CPC ...

G06F 40/289 (2020.01); G06F 40/30 (2020.01); G06N 3/08 (2023.01); H04L 51/02 (2022.01);

U.S. Cl.

CPC ...

G06F 40/289 (2020.01); G06F 40/30 (2020.01); G06N 3/08 (2013.01); H04L 51/02 (2013.01);

Abstract

Techniques for out-of-domain data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training a machine-learning model to identify one or more intents for one or more utterances, and augmenting the training set of utterances with out-of-domain (OOD) examples. The augmenting includes: generating a data set of OOD examples, filtering out OOD examples from the data set of OOD examples, determining a difficulty value for each OOD example remaining within the filtered data set of the OOD examples, and generating augmented batches of utterances comprising utterances from the training set of utterances and utterances from the filtered data set of the OOD based on the difficulty value for each OOD. Thereafter, the machine-learning model is trained using the augmented batches of utterances in accordance with a curriculum training protocol.

Find Patent Forward Citations

Loading…