Skip to content

MauriVass/FirstPersonActionRecognition

 
 

Repository files navigation

First Person Action Recognition

Introduction

Project developed for the Machine Learning & Deep Learning master course at Polythecnic of Turin.

The goal of the project was to understand the general approach to tackle the video recognition task and the problems that they may have dealing with the first-person data.

Development

The aim of the project was to implement a trainable deep neural network model for egocentric activity recognition. First, the EgoRNN model was reimplemented to reproduce the original experiments of paper [1], involving a multiple-stage training. Such model adopts a two-stream architecture, relying on spatial attention mechanisms that stimulate the network to attend to regions containing important objects related to the activity under consideration, and on optical flow in order to extract motion features.

In order to better capture spatio-temporal correlations between the two streams, the model was transformed into a multiple task single-stream neural network, forcing it to learn joint features between the two domains thanks to a self-supervised auxiliary motion segmentation block.

The network was further enhanced with an additional self-supervised task (both as a regression and classification task) that drives the network to better learn temporal correlations between frames.

Multiple experiments are carried out in order to assess the effectiveness of the adopted techniques.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 78.2%
  • Python 21.8%