The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Mar. 04, 2025

Filed:

Apr. 04, 2022
Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Takuya Yoshioka, Bellevue, WA (US);

Andreas Stolcke, Berkeley, CA (US);

Zhuo Chen, Woodinville, WA (US);

Dimitrios Basile Dimitriadis, Bellevue, WA (US);

Nanshan Zeng, Bellevue, WA (US);

Lijuan Qin, Seattle, WA (US);

William Isaac Hinthorn, Seattle, WA (US);

Xuedong Huang, Bellevue, WA (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/26 (2006.01); G10L 15/08 (2006.01); G10L 19/018 (2013.01);
U.S. Cl.
CPC ...
G10L 15/26 (2013.01); G10L 15/08 (2013.01); G10L 19/018 (2013.01);
Abstract

A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices. Operations include performing speech recognition on each audio stream by a corresponding speech recognition system to generate utterance-level posterior probabilities as hypotheses for each audio stream, aligning the hypotheses and formatting them as word confusion networks with associated word-level posteriors probabilities, performing speaker recognition on each audio stream by a speaker identification algorithm that generates a stream of speaker-attributed word hypotheses, formatting speaker hypotheses with associated speaker label posterior probabilities and speaker-attributed hypotheses for each audio stream as a speaker confusion network, aligning the word and speaker confusion networks from all audio streams to each other to merge the posterior probabilities and align word and speaker labels, and creating a best speaker-attributed word transcript by selecting the sequence of word and speaker labels with the highest posterior probabilities.


Find Patent Forward Citations

Loading…