The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G06F 40/30 (2020.01); G06N 7/00 (2006.01); G06N 20/00 (2019.01); G06F 40/284 (2020.01); G10L 15/18 (2013.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01);

U.S. Cl.

CPC ...

G06F 40/30 (2020.01); G06F 40/284 (2020.01); G06N 7/005 (2013.01); G06N 20/00 (2019.01); G10L 15/1822 (2013.01); G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 15/1815 (2013.01); G10L 2015/0633 (2013.01); G10L 2015/0636 (2013.01);

Abstract

Systems and methods for synthesizing training data for multi-intent utterance segmentation include identifying a first corpus of utterances comprising a plurality of distinct single-intent in-domain utterances; identifying a second corpus of utterances comprising a plurality of distinct single-intent out-of-domain utterances; identifying a third corpus comprising a plurality of distinct conjunction terms; forming a multi-intent training corpus comprising synthetic multi-intent utterances, wherein forming each distinct multi-intent utterance includes: selecting a first distinct in-domain utterance from the first corpus of utterances; probabilistically selecting one of a first out-of-domain utterance from the second corpus and a second in-domain utterance from the first corpus; probabilistically selecting or not selecting a distinct conjunction term from the third corpus; and forming a synthetic multi-intent utterance including appending the first in-domain utterance with one of the first out-of-domain utterance from the second corpus of utterances and the second in-domain utterance from the first corpus of utterances.

Find Patent Forward Citations