The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 05, 2022

Filed:

Aug. 16, 2017
Applicant:

Peking University Shenzhen Graduate School, Shenzhen, CN;

Inventors:

Wenmin Wang, Shenzhen, CN;

Zhihao Li, Shenzhen, CN;

Ronggang Wang, Shenzhen, CN;

Ge Li, Shenzhen, CN;

Shengfu Dong, Shenzhen, CN;

Zhenyu Wang, Shenzhen, CN;

Ying Li, Shenzhen, CN;

Hui Zhao, Shenzhen, CN;

Wen Gao, Shenzhen, CN;

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06K 9/62 (2022.01); G06N 3/04 (2006.01); G06F 17/15 (2006.01); G06F 17/18 (2006.01); G06N 3/08 (2006.01);
U.S. Cl.
CPC ...
G06N 3/0472 (2013.01); G06F 17/15 (2013.01); G06F 17/18 (2013.01); G06K 9/6267 (2013.01); G06N 3/08 (2013.01);
Abstract

A video action detection method based on a convolutional neural network (CNN) is disclosed in the field of computer vision recognition technologies. A temporal-spatial pyramid pooling layer is added to a network structure, which eliminates limitations on input by a network, speeds up training and detection, and improves performance of video action classification and time location. The disclosed convolutional neural network includes a convolutional layer, a common pooling layer, a temporal-spatial pyramid pooling layer and a full connection layer. The outputs of the convolutional neural network include a category classification output layer and a time localization calculation result output layer. The disclosed method does not require down-sampling to obtain video clips of different durations, but instead utilizes direct input of the whole video at once, improving efficiency. Moreover, the network is trained by using video clips of the same frequency without increasing differences within a category, thus reducing the learning burden of the network, achieving faster model convergence and better detection.


Find Patent Forward Citations

Loading…