The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
May. 14, 2024

Filed:

Dec. 31, 2021
Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Naoyuki Kanda, Bellevue, WA (US);

Takuya Yoshioka, Bellevue, WA (US);

Zhuo Chen, Bellevue, WA (US);

Jinyu Li, Sammamish, WA (US);

Yashesh Gaur, Redmond, WA (US);

Zhong Meng, Mercer Island, WA (US);

Xiaofei Wang, Bellevue, WA (US);

Xiong Xiao, Bothell, WA (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/06 (2013.01); G10L 15/26 (2006.01); G10L 17/04 (2013.01);
U.S. Cl.
CPC ...
G10L 17/04 (2013.01); G10L 15/06 (2013.01); G10L 15/26 (2013.01);
Abstract

The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of the obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on the CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription.


Find Patent Forward Citations

Loading…