The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 12, 2022

Filed:

May. 20, 2019
Applicant:

Deepmind Technologies Limited, London, GB;

Inventors:
Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/00 (2013.01); G10L 15/25 (2013.01); G06K 9/62 (2022.01); G06N 3/08 (2006.01); G06T 7/00 (2017.01); G10L 15/02 (2006.01); G10L 15/16 (2006.01); G10L 15/197 (2013.01); G06V 20/40 (2022.01); G06V 40/20 (2022.01); G06V 40/16 (2022.01);
U.S. Cl.
CPC ...
G10L 15/25 (2013.01); G06K 9/6217 (2013.01); G06N 3/08 (2013.01); G06T 7/0002 (2013.01); G06V 20/40 (2022.01); G06V 40/171 (2022.01); G06V 40/20 (2022.01); G10L 15/02 (2013.01); G10L 15/16 (2013.01); G10L 15/197 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/30168 (2013.01); G06T 2207/30201 (2013.01); G10L 2015/025 (2013.01);
Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing visual speech recognition. In one aspect, a method comprises receiving a video comprising a plurality of video frames, wherein each video frame depicts a pair of lips; processing the video using a visual speech recognition neural network to generate, for each output position in an output sequence, a respective output score for each token in a vocabulary of possible tokens, wherein the visual speech recognition neural network comprises one or more volumetric convolutional neural network layers and one or more time-aggregation neural network layers; wherein the vocabulary of possible tokens comprises a plurality of phonemes; and determining a sequence of words expressed by the pair of lips depicted in the video using the output scores.


Find Patent Forward Citations

Loading…