Complex traffic control systems are equipped with a range of cameras for traffic surveillance, road traffic measurements. On many sites the different cameras cover the same observation areas but provide different quality streams to the system, usually compressed for surveillance and raw for vehicle detection. Elimination of duplicate cameras especially high quality devices is desired for enhancing the performance of systems. Vehicle detectors based on image processing are sensitive to the quality of input video streams. The paper presents results from tests of using lossy data compression for delivering video streams to vehicle detectors for traffic control. The limit of data loss is determined for assuring correct vehicle detection. The recommendations can be used for optimising traffic vision systems.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
A lightweight neural network-based approach to two-person interaction classification in sparse image sequences, based on predetection of human skeletons in video frames, is proposed. The idea is to use an ensemble of “weak” pose classifiers, where every classifier is trained on a different time-phase of the same set of actions. Thus, differently than in typical assembly classifiers the expertise of “weak” classifiers is distributed over time and not over the feature domain. Every classifier is trained independently to classify time-indexed snapshots of a visual action, while the overall classification result is a weighted combination of their results. The training data need not any extra labeling effort, as the particular frames are automatically adjusted with time indices. The use of pose classifiers for video classification is key to achieve a lightweight solution, as it limits the motion-based feature space in the deep encoding stage. Another important element is the exploration of the semantics of the skeleton data, which turns the input data into reliable and powerful feature vectors. In other words, we avoid to spent ANN resources to learn feature-related information, that can be already analytically extracted from the skeleton data. An algorithm for merging-elimination and normalization of skeleton joints is developed. Our method is trained and tested on the interaction subset of the well-known NTU-RGB+D dataset , although only 2D skeleton information is used, typical in video analysis. The test results show comparable performance of our method with some of the best so far reported STM and CNN-based classifiers for this dataset, when they process sparse frame sequences, like we did. The recently proposed multistream Graph CNNs have shown superior results but only when processing dense frame sequences. Considering the dominating processing time and resources needed for skeleton estimation in every frame of the sequence, the key to real-time interaction recognition is to limit the number of processed frames.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.