I Introduction
In surveillance applications, the ability of longterm tracking a certain person or object from security cameras is highly desirable. Usually the security personnel can label a suspicious person in the video. The tracking system can then track this suspicious person in the videos without further human inputs for minutes or hours.
One situation that the visual tracking systems for the surveillance applications may deal with frequently is the suddenappearancechanges of the being tracked person or object. For example, the suspicious person may take off his/her jacket, pull up his/her hood, or abandon some luggage, in order to fool the surveillance system. Such suddenappearancechanges are suspicious activities and usually should be reported to the security personnel in realtime.
Unfortunately, it is usually difficult for computer vision algorithms to distinguish between a suddenappearancechange to occlusions. For many surveillance applications, the surveillance scenes are usually crowded with many people. Thus, occlusions may happen very frequently. Without the ability of distinguishing between suddenappearancechanges to occlusions, the visual tracking systems may generate a large number of false alarms.
Detecting occlusions correctly is also important for enhancing the reliability of visual tracking algorithms. It is wellknown that for visual tracking, there exists a so called “highadaptabilitytodriftingresistance tradeoff” problem. The problem relates to how much we should update our models for the beingtracked person or object onthefly. If we do not update the models much, then in the case that the appearance of the being tracked person changes rapidly, there is a high risk of tracking loss. If we always update the models very rapidly, then in the case of occlusions, the model may be tuned to occulders. And these wrongly tuned models may result in the so called drifting. That is, the tracking algorithm may start to track the occulders instead. In Fig. 2 of section IV, we actually show an example of such drifting phenomena.
In this paper, we propose a new occlusion and appearancechange detection method. The proposed realtime visual tracking system uses multiple surveillance cameras. Initially, the security personnel provides one bounding box of the suspicious person for each video frame sequence. The visual tracking system then tracks the whereabout of the suspicious person in realtime. If a suddenappearancechange of the suspicious person is detected, then the visual tracking system would raise an alarm signal immediately.
Our method uses both generative and discriminative models for the video frame streams. For each camera, one discriminative model is maintained for discriminating the image patches that contain the beingtracked person to the image patches that do not contain the beingtracked person. Similarly, one generative model is maintained for each camera. In this paper, we use a recently proposed compressive sensing and naive Bayes based classier in
[1]as the discriminative model. We use linear subspace models as the generative models. That is, we assume that the image patches containing the beingtracked person from several adjacent video frames all are vectors within a certain affine subspace.
A center component of our method is a hidden Markovian model for the prediction errors of the generative models. That is, whenever a new video frame is received, the new image patch containing the beingtracked person is predicted from the previous such image patches. The hidden Markovian model thus contains a visible part and a hidden part. The visible part contains the observed prediction errors. And the hidden part contains random variables
and . The binary random variable denotes whether an occlusion has occurred for the th camera at time . And the binary random variable denotes whether a suddenappearancechange of the beingtracked person has occurred at time .We assume some parametric probability distributions for the hidden Markovian model. The probabilities of
and are estimated from the observed prediction error by using sequential Bayesian estimation. An alarm signal may be raised, if we detect a high probability of , a suddenappearancechange occurred. The estimated probabilities of and are also used for adjusting the learning rates of the discriminative and generative models. It should be intuitively clear that any appearancechange of the being tracked objects may result in significant prediction errors at the same time at all the cameras, but occlusions usually result in significant prediction errors only at a few cameras. Their probabilities can thus be estimated accordingly.Please note that the above hidden Markovian statistical model is the centerpiece of the proposed occlusion and suddenappearancechange detection methods. The statistical model works also well with other discriminative and generative models.
There is a large literature on visual tracking algorithms, such as [2], [3], [4]. We have no intention here to provide a throughout survey on the general visual tracking algorithms. The approaches of using multiple cameras for visual tracking have become attractive in the recent years, due to the availability of large quantities of lowcost commodity cameras. There are some previous discussions on visual tracking using multiple cameras. In [5] [6], approaches are discussed, where the responsibilities of tracking may be passed from one camera to another camera. In [7], from each camera, a statistical estimation of the location of the being tracked person or object is obtained independently. The independent estimations are then fused into a joint location estimation. In [8], video frames from multiple cameras are projected on a reference frame using homography transforms, such that the signals corresponding to the being tracked person or object may be added constructively.
The rest of the paper is organized as follows. In Section II, we discuss the proposed visual tracking system and the hidden Markovian model. In Section III, we present the sequential Bayesian estimation methods for the hidden Markovian model. Simulation results of the proposed method is provided in Section IV. Finally, some concluding remarks are presented in Section V.
Ii Visual Tracking System and Hidden Markovian Model
A block diagram of the proposed visual tracking system is shown in Fig. 1. The system uses multiple cameras (only 2 are shown here). The system starts tracking a suspicious person, after the security personnel provides one bounding box of the suspicious person for each camera. For each camera, there is a realtime tracking subsystem as in [1]
(shown as classifiers in Fig.
1 ). All such realtime tracking subsystems work almost independently, except that their learning parameters are controlled by the center controller.The learning parameters control how much each individual realtime compressive tracking subsystem should update the discriminative model after receiving each new video frame. As the beingtracked person changes his/her pose, orientation etc., the appearance of the person may change smoothly. Thus, each compressive tracking subsystem may update its discriminative model according to each newly observed video frame. The parameter is a real number between and . If , then the compressive tracking subsystem does not update its model. If , then the compressive tracking subsystem updates the discriminative model using a combination of past observed video frames and the newly observed video frame.
Our proposed method adjusts the learning parameters based on the probabilities of suddenappearancechange and the probabilities of occlusions . Let us assume that we use security cameras. Suppose at each time , we receive one video frame at each camera, where . Each tracking subsystem then finds one image patch that contains the beingtracked person, where is a column vector. That is, is the vector obtained by stacking the pixels in the image patch.
For each camera, we use one generative model (shown as predictors in Fig. 1). Each predictor maintains an estimation of an affine subspace , such that all the past observed , , roughly lie within this affine space. We can then define a prediction error as the distance of the newly observed to this affine space . Note that there exist efficient algorithms for computing the affine space , such as in [9].
We may then estimate the probabilities of suddenappearancechange and the probabilities of occlusions from
based on the following hidden Markovian model. We assume that the probability density function of
where, indicates that a suddenappearancechange has occurred at time and indicates that an occlusion has occurred at the th camera at time . In other words, if , then the prediction error
is exponentially distributed. Otherwise, the prediction error
is uniformly distributed between
and . We further assume that and are statistically independent conditioned on and , if . We assume that each random process is a Markovian random process. Similarly, We also assume that the random process is Markovian.We may then use the Bayesian decision methods to detect occlusions at each camera and the sudden appearance change of the being tracked person or object by computing the probabilities
where denotes the collection of observed prediction errors , and denote the collection of variables . We show in Section III that these probabilities can be recursively calculated in a very efficient way.
The proposed method raises an alarm signal, whenever the probability goes over a certain threshold. The method may also adjust the learning parameters of the compressive tracking subsystems according to the probability . For example, we may set , whenever the probability is greater than .
Iii Recursive Bayesian Estimation
In this section, we derive a recursive formula for calculating from in Eq. 1, where (a) follows from the Markovian properties of the statistical model.
(1) 
Iv Experimental Result
We use one of the PETS 2007 (Tenth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance) datasets (available from http://www.cvg.reading.ac.uk/PETS2007/data.html). The dataset consists of 12000 video frames for 4 cameras, 3000 video frames for each camera. A suspicious person enters the scene at roughly frame 500 and drops and leaves behind his backpack at roughly frame 850.
We observe that the realtime compressive tracking systems with a fixed learning parameter in the prior art [1] do have the drifting problem as shown in Fig. 2. We show in Fig. 3 that such drifting problem can be avoided by the algorithm proposed in this paper. The occlusion event around frame 577 is detected by our algorithm very clearly with the corresponding probability close to .
The proposed algorithm also detects the unusual behavior of the beingtracked person at frame 851 (with the corresponding probability higher than ). The proposed algorithm is able to track the whereabout of the suspicious person as shown in Figs. 4, and 5, where each figure shows the tracking results at one camera.
In all the above experimental results, the tracking results are shown as red bounding boxes. And the frame indexes are labelled in all the images.
(a)  (b)  (c) 
(d)  (e)  (f) 
(a)  (b)  (c) 
(d)  (e)  (f) 
(a)  (b)  (c) 
(d)  (e)  (f) 
(a)  (b)  (c) 
(d)  (e)  (f) 
V Conclusion
The paper discusses a visual tracking algorithm to detect suddenappearancechange and occlusions. By experimental results, we show that the proposed algorithm can reliably detect the suddenappearancechange and occlusion events. Such reliable estimations can also be used to avoid the drifting problems.
Acknowledgment
This research was originally submitted to Xinova, LLC by the author in response to a Request for Invention. It is among several submissions that Xinova has chosen to make available to the wider community. The author wishes to thank Xinova, LLC for their funding support of this research. More information about Xinova, LLC is available at www.xinova.com.
References
 [1] K. Zhang, L. Zhang, and M.H. Yang, “Realtime compressive tracking,” in 12th European Conference on Computer Vision (ECCV), October 2012.
 [2] V. R. D. Comaniciu and P. Meer, “Kernelbased object tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 5, 2003.
 [3] S. Avidan, “Support vector tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 8, 2004.
 [4] ——, “Ensemble tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 2, 2007.
 [5] M. Quaritsch, M. Kreuzthaler, B. Rinner, H. Bischof, and B. Strobl, “Autonomous multicamera tracking on embedded smart cameras,” Journal on Embedded Systems, January 2007.
 [6] Q. Cai and J. Aggarwal, “Tracking human motion in structured environments using a distributedcamera system,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 12, 1999.

[7]
M. Bhuyan, B. Lovell, and A. Bigdeli, “Tracking with multiple cameras for
video surveillance,” in
9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications
, December 2007.  [8] R. Eshel and Y. Moses, “Homography based multiple camera detection and tracking of people in a dense crowd,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2008.
 [9] A. Levy and M. Lindenbaum, “Sequential KarhunenLoeve basis extraction and its applications to images,” IEEE Transactions on Image Processing, vol. 9, no. 8, 2000.
Comments
There are no comments yet.