Human Action Recognition Using Pyramid Histograms of Oriented Gradients and Collaborative Multi-task Learning

Human Action Recognition Using Pyramid Histograms of Oriented Gradients and Collaborative Multi-task Learning
Human Action Recognition Using Pyramid Histograms of Oriented Gradients and Collaborative Multi-task Learning

ㆍ 저자명: Gao. Zan,Zhang. Hua,Liu. An-An,Xue. Yan-Bing,Xu. Guang-Ping
ㆍ 간행물명: KSII Transactions on internet and information systems : TIIS
ㆍ 권/호정보: 2014년|8권 2호|pp.483-503 (21 pages)
ㆍ 발행정보: 한국인터넷정보학회
ㆍ 파일정보: 정기간행물|ENG|
PDF텍스트
ㆍ 주제분야: 기타

이 논문은 한국과학기술정보연구원과 논문 연계를 통해 무료로 제공되는 원문입니다.

서지반출

기타언어초록

In this paper, human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning is proposed. First, we accumulate global activities and construct motion history image (MHI) for both RGB and depth channels respectively to encode the dynamics of one action in different modalities, and then different action descriptors are extracted from depth and RGB MHI to represent global textual and structural characteristics of these actions. Specially, average value in hierarchical block, GIST and pyramid histograms of oriented gradients descriptors are employed to represent human motion. To demonstrate the superiority of the proposed method, we evaluate them by KNN, SVM with linear and RBF kernels, SRC and CRC models on DHA dataset, the well-known dataset for human action recognition. Large scale experimental results show our descriptors are robust, stable and efficient, and outperform the state-of-the-art methods. In addition, we investigate the performance of our descriptors further by combining these descriptors on DHA dataset, and observe that the performances of combined descriptors are much better than just using only sole descriptor. With multimodal features, we also propose a collaborative multi-task learning method for model learning and inference based on transfer learning theory. The main contributions lie in four aspects: 1) the proposed encoding the scheme can filter the stationary part of human body and reduce noise interference; 2) different kind of features and models are assessed, and the neighbor gradients information and pyramid layers are very helpful for representing these actions; 3) The proposed model can fuse the features from different modalities regardless of the sensor types, the ranges of the value, and the dimensions of different features; 4) The latent common knowledge among different modalities can be discovered by transfer learning to boost the performance.

키워드

Action Recognition collaborative multi-task learning PHOG depth

다운URL