The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G10L 15/00 (2013.01); G10L 15/25 (2013.01); G06K 9/62 (2022.01); G06N 3/08 (2006.01); G06T 7/00 (2017.01); G10L 15/02 (2006.01); G10L 15/16 (2006.01); G10L 15/197 (2013.01); G06V 20/40 (2022.01); G06V 40/20 (2022.01); G06V 40/16 (2022.01);

U.S. Cl.

CPC ...

G10L 15/25 (2013.01); G06K 9/6217 (2013.01); G06N 3/08 (2013.01); G06T 7/0002 (2013.01); G06V 20/40 (2022.01); G06V 40/171 (2022.01); G06V 40/20 (2022.01); G10L 15/02 (2013.01); G10L 15/16 (2013.01); G10L 15/197 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/30168 (2013.01); G06T 2207/30201 (2013.01); G10L 2015/025 (2013.01);

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.

Find Patent Forward Citations