Human Behavior Modeling in Long Videos: Drowsiness Detection and Action Segmentation
Abstract
"In this thesis we focus on two instances of human behavior modeling in long untrimmed videos: drowsiness detection, and action segmentation.
In the first section, we focus on drowsiness detection. Specifically, we introduce a large and public real-life dataset and a baseline temporal model to classify drowsiness into three stages of alert, low vigilant, or drowsy. In the second section, we study action segmentation in instructional videos under weak supervision. In order to save time and cost, weakly supervised methods are trained based on only video-level action sequences as opposed to a fully supervised method which is trained using frame-level labels. We study weakly-supervised action segmentation from multiple aspects. First, we present a duration model to predict the remaining duration of an ongoing action to iteratively align a given sequence of action in an input video. Second, we propose a hierarchical approach to segmentation, where top level tasks are predicted to constrain lower level atomic actions. Third, we introduce the first weakly-supervised online action segmentation model to segment streaming videos online at test time using Dynamic Programming and show its advantages over greedy sliding window approach. Finally, we present a multi-view training strategy to exploit frame-wise correspondence between multiple views as supervision for training weakly-labeled instructional videos. The experimental results on multiple public datasets show the efficacy of our algorithms."