The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Apr. 23, 2024

Filed:

Sep. 10, 2021
Applicant:

Institute of Automation, Chinese Academy of Sciences, Beijing, CN;

Inventors:

Jianhua Tao, Beijing, CN;

Cong Cai, Beijing, CN;

Bin Liu, Beijing, CN;

Mingyue Niu, Beijing, CN;

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
A61B 5/16 (2006.01); A61B 5/00 (2006.01); G06F 18/25 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/048 (2023.01); G06N 3/08 (2023.01); G06T 7/00 (2017.01); G06V 10/80 (2022.01); G06V 20/40 (2022.01); G10L 25/30 (2013.01); G10L 25/57 (2013.01); G10L 25/63 (2013.01); G10L 25/66 (2013.01);
U.S. Cl.
CPC ...
A61B 5/165 (2013.01); A61B 5/4803 (2013.01); A61B 5/7275 (2013.01); G06F 18/253 (2023.01); G06N 3/08 (2013.01); G06T 7/0012 (2013.01); G06V 20/46 (2022.01); G06V 20/49 (2022.01); G10L 25/30 (2013.01); G10L 25/57 (2013.01); G10L 25/63 (2013.01); G10L 25/66 (2013.01); G06T 2207/10016 (2013.01);
Abstract

Disclosed is an automatic depression detection method using audio-video, including: acquiring original data containing two modalities of long-term audio file and long-term video file from an audio-video file; dividing the long-term audio file into several audio segments, and meanwhile dividing the long-term video file into a plurality of video segments; inputting each audio segment/each video segment into an audio feature extraction network/a video feature extraction network to obtain in-depth audio features/in-depth video features; calculating the in-depth audio features and the in-depth video features by using multi-head attention mechanism so as to obtain attention audio features and attention video features; aggregating the attention audio features and the attention video features into audio-video features; and inputting the audio-video features into a decision network to predict a depression level of an individual in the audio-video file.


Find Patent Forward Citations

Loading…