Wednesday, Aug. 31
- 08:00
- Registration
- 08:30 — 09:00
- Posters set-up
- 09:00 — 09:20
- Welcome
- 09:20 — 10:20
- Keynote lecture
Nasir Memon (more...) - 10:20 — 10:40
- Coffee Break
- 10:40 — 12:00
- Oral Session 1: Pose, gait and faces
-
Combined Estimation of Location and Body Pose in Surveillance Video
Abstract
In surveillance videos, cues such as head or body pose provide important information for analyzing people's behavior and interactions. In this paper we propose an approach that jointly estimates body location and body pose in monocular surveillance video. Our approach is based on tracks derived by multi-object tracking. First, body pose classification is conducted using sparse representation technique on each frame of the tracks, generating (noisy) observation on body poses. Then, both location and body pose in 3D space are estimated jointly in a particle filtering framework by utilizing a soft coupling of body pose with the movement. The experiments show that the proposed system successfully tracks body position and pose simultaneously in many scenarios. The output of the system can be used to perform further analysis on behaviors and interactions.
Gaze and Body Pose Estimation from a Distance
Abstract
We present a comprehensive approach to estimating location, body pose and gaze direction of multiple individuals in unconstraint environments. The approach combines person detections from fixed cameras with directional face detections obtained from actively controlled pan tilt zoom (PTZ) cameras. The main contribution of this work is to estimate both body pose and gaze direction independently from motion direction using a combination of sequential Monte Carlo Filtering and MCMC sampling. There are numerous benefits in obtaining body pose and gaze angle information. In surveillance it allows the system to track what people are looking at, can optimize the control of active cameras for biometric face capture, and can provide better interaction metrics between pairs of people. The availability of gaze and face detection information also improves localization and data association for tracking in crowded environments. The performance of the system will be demonstrated on data captured at a real-time surveillance site.
Pairwise Shape Configuration-based PSA for Gait Recognition under Small Viewing Angle Change
Abstract
Two main components of Procrustes Shape Analysis (PSA) are adopted and adapted specifically to address gait recognition under small viewing angle change: 1) Procrustes Mean Shape (PMS) for gait signature description; 2) Procrustes Distance (PD) for similarity measurement. Pairwise Shape Configuration (PSC) is proposed as a shape descriptor in place of existing Centroid Shape Configuration (CSC) in conventional PSA. PSC can better tolerate shape deviation caused by viewing angle change than CSC. Small variation of viewing angle makes large impact only on global gait appearance. Without major impact on local spatio-temporal motion, PSC which effectively embeds local shape information can generate robust view-invariant gait feature. To enhance gait recognition performance, a novel boundary re-sampling process is proposed. It provides only necessary re-sampled points to PSC description. In the meantime, it efficiently solves problems of boundary point correspondence, boundary normalization and boundary smoothness. This re-sampling process adopts prior knowledge of body pose structure. Comprehensive experiment is carried out on CASIA gait database. The proposed method is shown to significantly improve performance of gait recognition under small viewing angle change without additional requirements of supervised learning, known viewing angle and multi-camera system, when compared with other existing methods in literatures.
Learning to Recognize Faces from Videos and Weakly Related Information Cues
Abstract
Videos are often associated with additional information that could be valuable for interpretation of its content. This especially applies for the recognition of faces within video streams, where often cues such as transcripts and subtitles are available. However, this data is not completely reliable and might be ambiguously labeled. To overcome these limitations, we propose a new semi supervised multiple instance learning algorithm, where the contribution is twofold. First, we can transfer information on labeled bags of instances, thus, enabling us to weaken the prerequisite of knowing the label for each instance. Second, we can integrate unlabeled data, given only probabilistic information in form of priors. The benefits of the approach are demonstrated for face recognition in videos on a publicly available benchmark dataset.
- 12:00 — 13:30
- Lunch
- 13:30 — 14:30
- Oral Session 2: Object detection
-
Data Driven Frequency Mapping for Computationally Scalable Object Detection
Abstract
Nonlinear kernel Support Vector Machines achieve better generalizations, yet their training and evaluation speeds are prohibitively slow for real-time object detection tasks where the number of data points in training and the number of hypotheses to be tested in evaluation are in the order of millions. To accelerate the training and particularly testing of such nonlinear kernel machines, we map the input data onto a low-dimensional spectral (Fourier) feature space using a cosine transform, design a kernel that approximates the classification objective in a supervised setting, and apply a fast linear classifier instead of the conventional radial basis functions. We present a data driven hypotheses generation technique and a LogistBoost feature selection. Our experimental results demonstrate excellent computational improvements up to 140 while maintaining high classification accuracy for large-scale object detection that is a fundamental task in surveillance applications.
Modeling of Temporarily Static Objects for Robust Abandoned Object Detection in Urban Surveillance
Abstract
We propose a robust approach of abandoned object detection for urban surveillance with over thousands of cam- eras. For such a large-scale monitoring based on intelligent video analysis, it is critical that a system be designed with careful control of false alarms. Our approach is based on proactive modeling of temporally static objects (TSO) such as cars stopping at red light and still pedestrians in the street. We develop a finite state machine to track the entire life cycle of TSOs from creation to termination. The information of the static objects provided by the state machine in turn allows adaptive updating of the background model on region level without using any sophisticated object classification. We demonstrate that our approach significantly mitigates the problematic issue of false alarms caused by pedestrians in city surveillance, using both a small publicly available data set and a large data set collected from various realistic urban scenarios.
A New Multi-Lateral Filter for Real-Time Depth Enhancement
Abstract
We present an adaptive multi-lateral filter for real-time low-resolution depth map enhancement. Despite the great advantages of Time-of-Flight cameras in 3-D sensing, there are two main drawbacks that restricts their use in a wide range of applications; namely, their fairly low spatial resolution, compared to other 3-D sensing systems, and the high noise level within the depth measurements. We therefore propose a new data fusion method based upon a bilateral filter. The proposed filter is an extension of the pixel weighted average strategy for depth sensor data fusion. It includes a new factor that allows to adaptively consider 2-D data or 3-D data as guidance information. Consequently, unwanted artefacts such as texture copying get almost entirely eliminated, outperforming alternative depth enhancement filters. In addition, our algorithm can be effectively and efficiently implemented for real-time applications.
- 14:30 — 15:00
- Poster Sesion A: Teasers
- 15:00 — 15:20
- Coffee Break (+ posters)
- 15:20 — 16:30
- Poster Session A
-
Crowd analysis
-
Stereoscopic Viewing Facilitates the Perception of Crowds
Abstract
In this study the perception of crowds was investigated in urban environment. The images of crowds were viewed non-stereoscopically and stereoscopically with HMD (head-mounted display). The task of the participants was to count the number of persons in the crowds. The results clearly indicate that stereoscopic viewing enhances perception of crowds. The counting task was determined to be easiest with stereoscopic viewing, its error rate was significantly smaller and it was significantly preferred to non-stereoscopic viewing. The viewing method did not differ statistically with respect to the completion time.
Robust People Counting in Video Surveillance: Dataset and System
Abstract
As an important application in civilian surveillance, pedestrian counting is challenging due to the occlusion and cluttered background. In this paper, we present an efficient people counting system based on regression and template matching. This method can effectively overcome the shortcomings of pedestrian detecting and tracking-based method and feature regression-based method. At the same time, we also introduce a challenging and practical public dataset named CASIA Pedestrian Counting Dataset. It contains richly annotated video and images captured from daily surveillance scenes. Experimental results on the proposed dataset show that our counting system is robust and accurate.
Crowd Flow Estimation Using Multiple Visual Features For Scenes With Changing Crowd Densities
Abstract
Crowd estimation and monitoring is an important surveillance task. We address the problem of estimating the ``flow,'' that is the number of persons passing a designated region in a unit time. We designate an area of the scene as a virtual trip wire and accumulate the total number of foreground pixels (in the trip wire) over a chosen time period. We show that cumulative pixel count is related to the number of persons passing through the trip-wire by a scale factor. This scale factor is highly sensitive to the ``crowdedness'' (levels of crowd density) of the scene which creates different levels of occlusion of the individuals walking/passing through the trip-wire. We use texture features to determine the crowdedness and choose the most appropriate scaling factor. Our method does not require detection and tracking of individuals and is robust to scene dynamics, background subtraction errors, and different crowd levels.
Real-time person counting by propagating networks flows
Abstract
In this paper we present a system that tracks multiple persons by detection in real-time. We introduce a measure for similarity of detections which segments significant information from background clutter by using statistical information obtained during the learning phase of the detector. In order to track multiple persons we map the detections into flow networks utilizing this measure. A continuous real-time processing of video streams is accomplished by analyzing only small chunks of detections consecutively using different networks. By propagating the result of one network into the subsequent one a temporal consistent association is achieved. The system was evaluated using a standard video sequence with a crowded scene and an own dataset containing very long sequences. The results demonstrate that the system performs comparable to other systems while meeting real-time requirements.
Complementary Background Models for the Detection of Static and Moving Objects in Crowded Environments
Abstract
In this paper we propose the use of complementary background models for the detection of static and moving objects in crowded video sequences. One model is devoted to accurately detect motion, while the other aims to achieve a representation of the empty scene. The differences in foreground detection of the complementary models are used to identify new static regions. A subsequent analysis of the detected regions is used to ascertain if an object was placed in or removed from the scene. Static objects are prevented from being incorporated into the empty scene model. Removed objects are rapidly dropped from both models. In this way, we build a very precise model of the empty scene and improve the foreground segmentation results of a single background model. The system was validated with several public datasets, showing many advantages over state-of-the-art static objects and foreground detectors.
-
Audio surveillance
-
Multiple Acoustic Sources Localization Using Incident Signal Power Comparison
Abstract
We present a novel approach to locate multiple acoustic sources in far-field environments, in order to solve an interesting problem in different application domain, such as: audio surveillance systems and soundscape analysis frameworks. This approach aims at finding a solution to the ambiguities in Direction Of Arrivals (DOAs) combination caused by simultaneous multiple sources. The algorithm is based on two steps: the separation of the sources by means of beamforming techniques and the comparison of the Incident Signal Power (ISP) spectrum by means of a spectral distance measure. We implemented a prototype, composed by two linear arrays, that has been successfully tested in a real noisy environment.
Tracking sound sources by means of HMM
Abstract
Video-based surveillance systems may benefit from the integration with microphone arrays for the localization of sound events. Applying the sound localization techniques to the surveillance of large areas requires to address some open issues, such as the non uniform resolution of the microphones-based localization systems. This paper presents a new method for tracking moving sound events based on an Hidden Markov Model (HMM), which exploits a priori information derived from medium and long-term observations of the monitored area. The results obtained with simulated trajectories show that the HMM-based tracker is able to significantly reduce the localization error. Applications can be found in surveillance systems for large areas, such as square, streets, or parking lots.
Hierarchical approach for abnormal acoustic event classification in an elevator
Abstract
In this paper, we propose a hierarchical method to detect and classify abnormal acoustic events occurring in an elevator environment. The Gaussian Mixture Model (GMM) based event classifier essentially employs two types of acoustic features; Mel Frequency Cepstral Coefficient (MFCC) and Timbre. We explore the effectiveness of various combinations of the two features in terms of classification performance. In addition, we design a hierarchical approach for realizing acoustic event classification and compare it with a single-level approach. It can be verified from an experiment, that the classification performance is improved when the proposed hierarchical approach is applied. In particular, for detection of abnormal situations, we employ a maximum likelihood estimation approach for acoustic event recognition at the 1st step, and then on the 2nd step we determine the abnormal contexts by using the ratio of abnormal events to cumulative events during a certain period. For performance evaluation, we employ a database collected in an actual elevator under several scenarios. By experimental results, our proposed method demonstrates 91% correct detection rate and 2.5% error detection rate for abnormal context.
-
Activity monitoring
-
Mono versus multi-view tracking-based model for automatic scene activity modeling and anomaly detection
Abstract
In this paper, we present a novel method able to automatically discover recurrent activities occurring in a video scene, and to identify the temporal relations between these activities and can be used either in mono-view or in multi-view context (for example discover the different flows of pedestrians inside a subway station and identify the rules that governs these flows). The proposed method is based on particle-based trajectories, analyzed through a cascade of HMM and HDP-HMM models. We experiment our model for scene activity recognition task on a subway dataset using both mono-view and multi-view analysis. We last show that our model is also able to perform on the fly and in real-time abnormal events detection (by identifying activities or relations that do not fit in the usual/learnt ones).
Discrimination of abandoned and stolen object based on active contours
Abstract
In this paper we propose an approach based on active contours to discriminate previously detected static foreground regions between abandoned and stolen. Firstly, the contour of the stationary foreground object is extracted. Then, an active contour adjustment is performed on the current and the background frames. Finally, similarities between the initial contour and the two adjustments are studied to decide whether the object is abandoned or stolen. Three different methods have been tested for the active contour adjustment. Experimental results over a heterogeneous dataset show that the proposed method outperforms state-of-art approaches and provides a robust solution against non-accurate data (i.e., foreground static objects wrongly segmented) that is common in complex scenarios.
Action recognition using tri-view constraints
Abstract
Two-view methods have been well developed to identify human actions. However, in a case where the corresponding imaged points cannot induce distinguished measures, the performance of the methods deteriorates. For this reason, we propose a new view-invariant measure for human action recognition by enforcing tri-view constraints in this paper. We apply our approach to video synchronization by imposing both the similarity ratio and the consistency in the trifocal tensor over entire video sequences. By testing on both synthetic and real data, our method has achieved higher tolerance to noise levels, as well as higher identification accuracy than the traditional two-view method. Experimental results demonstrate that our approach can identify human pose transitions, despite of dynamic time-lines, different viewpoints, and unknown camera parameters.
Detection of Abnormal Behaviour in a Surveillance Environment Using Control Charts
Abstract
This paper introduces a new approach to unsupervised detection of abnormal sequences of images in video surveillance data. We leverage an online object detection method and statistical process control techniques in order to identify suspicious sequences of events. Our method assumes a training phase in which the spatial distribution of objects is learned, followed by a chart-based tracking process. We evaluate the performance of our method on a standard dataset and have implemented a publicly available open-source prototype.
Modeling of Moving Object Trajectory by Spatio-temporal Learning for Abnormal Behavior Detection
Abstract
This paper proposes a trajectory analysis method by handling the spatio-temporal property of trajectory. Not using similarity measures of two trajectories, our model analyzes overall path of a trajectory. Learning of spatio property is presented as semantic regions (e.g. go straight, turn left, turn right) that are clustered effectively using topic model. The temporal order of observations on a trajectory is taken into account using HMM for detecting global anomaly. Results of experiments show that modeling of semantic region and detecting of unusual trajectories are successful even in complex scenes.
Abnormal Events Detection using Unsupervised One-Class SVM - Application to Audio Surveillance and Evaluation
Abstract
This paper proposes an unsupervised method for real time detection of abnormal events in the context of audio surveillance. Based on training a One-Class Support Vector Machine (OC-SVM) to model the distribution of the normality (ambience), we propose to construct sets of decision functions. This modification allows controlling the trade-off between false-alarm and miss probabilities without modifying the trained OC-SVM that best capture the ambience boundaries, or its hyperparameters. Then we present an adaptive online scheme of temporal integration of the decision function output in order to increase performance and robustness. We also introduce a framework to generate databases based on real signals for the evaluation of audio surveillance systems. Finally, we present the performances obtained on the generated database.
-
Tracking
-
An Improved Mean Shift Tracker with Fast Failure Recovery Strategy after Complete Occlusion
Abstract
The effectiveness of the conventional Mean Shift tracking algorithm diminishes for fast moving targets and complete occlusion. In this paper an improved Mean Shift algorithm comprising a fast failure recovery strategy that aims to deal with randomly moving targets and complete occlusion as encountered in crowded scenes is presented. Experimental results show that after complete occlusion or target loss, the new algorithm can effectively recover and continue to successfully track targets in complex scenarios.
Extended Feature-based Object Tracking in Presence of Data Association Uncertainty
Abstract
This paper proposes and algorithm for extended object tracking using sparse shape points. The described technique is based on the Rao-Blackwellized Particle Filter. In particular, two different data association techniques that take into consideration clutter and missed detections, are coupled and tested in order to provide a comparison of their performance for the problem of extended object tracking.
Appearance tracking by transduction in surveillance scenarios
Abstract
We propose a formulation of people tracking problem as a Transductive Learning (TL) problem. TL is an effective semi-supervised learning technique by which many classification problems have been recently reinterpreted as learning labels from incomplete datasets. In our proposal the joint exploitation of spectral graph theory and Riemannian manifold learning tools leads to the formulation of a robust approach for appearance based tracking in Video Surveillance scenarios. The key advantage of the presented method is a continuously updated model of the tracked target, used in the TL process, that allows to on-line learn the target visual appearance and consequently to improve the tracker accuracy. Experiments on public datasets show an encouraging advancement over alternative state-of the-art techniques.
Real Time Color Based Particle Filtering for Object Tracking with Dual Cache Architecture
Abstract
Particle filtering framework is widely used on tracking applications. In surveillance systems, it often combines with color information to achieve visual object tracking. However, the resource usage of this framework, including memory bandwidth and operation cycle, is very intensive to make a low cost real time object tracking unattainable. In this paper, an efficient architecture of color based particle filtering tracking with dual cache is proposed. It utilizes a frame cache to reduce memory bandwidth of loading frame data, and a histogram cache to reduce operation cycles when constructing color histogram. The experimental results show that this architecture can improve the performance to real time on a high system specification.
-
Active and multiple camera systems
-
Multi-tasking Smart Cameras for Intelligent Video Surveillance Systems
Abstract
We demonstrate a video surveillance system---comprising passive and active pan/tilt/zoom (PTZ) cameras---that intelligently responds to scene complexity, automatically capturing higher resolution video when there are fewer people in the scene and capturing lower resolution video as the number of pedestrians present in the scene increases. To this end, we have developed behavior based-controllers for passive and active cameras, enabling these cameras to carry out multiple observation tasks simultaneously. The research presented herein is a step towards video surveillance systems---consisting of a heterogeneous set of sensors---that provide persistent coverage of large spaces, while optimizing surveillance data collection by tuning the sensing parameters of individual sensors (in a distributed manner) in response to scene activity.
Continuous Recovery for Real Time Pan Tilt Zoom Localization and Mapping
Abstract
We propose a method for real time recovering from tracking failure in monocular localization and mapping with a Pan Tilt Zoom camera (PTZ). The method automatically detects and seamlessly recovers from tracking failure while preserving map integrity. By extending recent advances in the PTZ localization and mapping, the system can quickly and continuously resume tracking failures by determining the best way to task two different localization modalities. The tradeoff involved when choosing between the two modalities is captured by maximizing the information expected to be extracted from the scene map. This is especially helpful in four main viewing condition: blurred frames, weak textured scene, not up to date map and occlusions due to sensor quantization or moving objects. Extensive tests show that the resulting system is able to recover from several different failures while zooming-in weak textured scene, all in real time.
A Unified Rectification Method for Single Viewpoint Multi-Camera System
Abstract
Stereo matching and 3D reconstruction has been studied for decades as a fundamental problem in the field of computer vision. Recent years, stereo matching and 3D reconstruction with a large field of view, especially using panoramic images and omnidirectional vision, has received increasing attention. As a pre-step for dense stereo matching, methods are proposed to rectify different kinds of omnidirectional stereo image pairs. However, no one has described a rectification method applied to multi-camera omnidirectional systems yet. In this work, we proposed a rectification algorithm based on spherical camera model for rectifying omnidirectional stereo pairs, especially well suitable for the multi-camera omnidirectional systems as long as a spherical camera model is able to be applied. We describe the geometrical framework of the algorithm and implement it. Also we present the experimental results of the real stereo image pairs captured by Ladybug3. As the experimental results show, the effect of rectification is promising.
Efficiently Secure Image Transmission against Tampering in Wireless Visual Sensor Networks
Abstract
Wireless visual sensor networks for surveillance can dissipate their limited resources to process irrelevant images maliciously injected by compromised nodes. In particular, since nodes compromised by tampering reveal their security keys to encrypt messages, traditional data authentication techniques cannot identify false data deceivably encrypted by such stolen keys. To challenge this problem, this paper first presents a surveillance-fitted network model our false data sensitive protocol efficiently works. The protocol allows malicious messages to travel only one hop by authentically and semantically testing every packet and every image from wireless cameras vulnerable to tampering. We additionally suggest a dynamic key scheme which lets wireless sensors employ a different key in each communication for reasonable resource consumption, in order to reduce a possibility of key disclosure itself. Two lemmas and three comparison results verify how well our three approaches outperform their alternatives in resiliency to compromised nodes and memory, computation and communication overheads.
- 16:30 — 17:50
- Oral Session 3: Object tracking and re-identification
-
Multiple-shot Human Re-Identification by Mean Riemannian Covariance Grid
Abstract
Human re-identification is defined as a requirement to determine whether a given individual has already appeared over a network of cameras. This problem is particularly hard by significant appearance changes across different camera views. In order to re-identify people a human signature should handle difference in illumination, pose and camera parameters. We propose a new appearance model combining information from multiple images to obtain highly discriminative human signature, called Mean Riemannian Covariance Grid (MRCG). The method is evaluated and compared with the state of the art using benchmark video sequences from the ETHZ and the i-LIDS datasets. We demonstrate that the proposed approach outperforms state of the art methods. Finally, the results of our approach are shown on two other more pertinent datasets.
Multiple view, multiple target tracking with principal axis-based data association
Abstract
We present a novel method for multi-object tracking that tracks target both in the video streams and in a reference ground frame. This allows to remove ambiguities created by occlusions in one view. Our system improves a recently proposed collaborative scheme and makes it handle multiple targets. We use a fast, simple solution for data association in the ground plane based on principal axis and a partly joint probabilistic model with MCMC sampling to ensure that tracked targets are kept separated whenever groups of targets appear. Results are presented on several popular databases of multi-camera, multi-target videos.
Formulation, Detection and Application of Occlusion States (Oc-7) in the Context of Multiple Object Tracking
Abstract
Occlusion is often thought of as a challenge for visual algorithms, specially tracking. Existing literature, however, has identified a number of occlusion categories in the context of tracking in ad hoc manner. We propose a systematic approach to formulate a set of occlusion cases by considering the spatial relations among object support(s) (projections on the image plane) with the detected foreground blob(s), to show that only $7$ occlusion states are possible. We designate the resulting qualitative formalism as $Oc-7$, and show how these occlusion states can be detected and used effectively for the task of multi-object tracking under occlusion of various types. The object support is decomposed into overlapping patches which are tracked independently on the occurrence of occlusions. As a demonstration of the application of these occlusion states, we propose a reasoning scheme for selective tracker execution and object feature updates to track multiple objects in complex environments.
View-invariant Person Re-identification with an Implicit Shape Model
Abstract
In this paper, we approach the task of appearance based person re-identification for scenarios where no biometric features can be used. For that, we build on a person reidentification approach that uses the Implicit Shape Model (ISM) and SIFT features for re-identification. This approach builds identity models of persons during tracking and employs these models for re-identification. We apply this re-identification, which was until now only evaluated in the infrared spectrum, to data acquired in the visible spectrum. Furthermore we evaluate view independence of the re-identification approach and introduce methods that extend view invariance. Specifically, we (i) propose a method for online view-determination of a tracked person, (ii) use the online view-determination to generate view specific identity models of persons which increase model distinctiveness in re-identification, and (iii) introduce a method to convert identity models between views to increase view independence.





