The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Apr. 02, 2019

Filed:

Dec. 14, 2016
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Dimitrios B. Dimitriadis, White Plains, NY (US);

David C. Haws, New York, NY (US);

Michael Picheny, White Plains, NY (US);

George Saon, Stamford, CT (US);

Samuel Thomas, Elmsford, NY (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 21/00 (2013.01); G10L 15/00 (2013.01); G10L 15/04 (2013.01); G10L 25/30 (2013.01); G10L 25/78 (2013.01); G10L 17/18 (2013.01);
U.S. Cl.
CPC ...
G10L 15/04 (2013.01); G10L 17/18 (2013.01); G10L 25/30 (2013.01); G10L 25/78 (2013.01);
Abstract

Speaker diarization is performed on audio data including speech by a first speaker, speech by a second speaker, and silence. The speaker diarization includes segmenting the audio data using a long short-term memory (LSTM) recurrent neural network (RNN) to identify change points of the audio data that divide the audio data into segments. The speaker diarization includes assigning a label selected from a group of labels to each segment of the audio data using the LSTM RNN. The group of labels comprising includes labels corresponding to the first speaker, the second speaker, and the silence. Each change point is a transition from one of the first speaker, the second speaker, and the silence to a different one of the first speaker, the second speaker, and the silence. Speech recognition can be performed on the segments that each correspond to one of the first speaker and the second speaker.


Find Patent Forward Citations

Loading…