The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 22, 2025

Filed:

Mar. 08, 2022
Applicant:

Sony Group Corporation, Tokyo, JP;

Inventors:

Shiwei Jin, San Diego, CA (US);

Jong Hwa Lee, San Diego, CA (US);

Matthew Wnuk, San Diego, CA (US);

Francisco Costela, San Diego, CA (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/25 (2013.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01); G06V 40/20 (2022.01); G10L 25/30 (2013.01);
U.S. Cl.
CPC ...
G10L 15/25 (2013.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01); G06V 20/49 (2022.01); G06V 40/20 (2022.01); G10L 25/30 (2013.01);
Abstract

An electronic apparatus and method for visual speech recognition based on connectionist temporal classification (CTC) loss is disclosed. The electronic apparatus receives a video that includes human speakers and generates a prediction corresponding to lip movements of the human speakers. The prediction is generated based on application of a Deep Neural Network (DNN) on the video and the DNN is trained using a CTC loss function. The electronic apparatus detects, based on the prediction, word boundaries in a sequence of characters that correspond to the lip movements and divides the video into a sequence of video clips based on the detection. Each video clip corresponds to a word spoken by the human speakers. The electronic apparatus generates a sequence of word predictions by processing the sequence of video clips and generates a sentence, or a phrase based on the generated sequence of word predictions.


Find Patent Forward Citations

Loading…