The paper reports some results on neural architectures for learning numerical concepts from visual data. We use datasets of small images with single-pixel dots (one to six per image) to learn the abstraction of small integers, and other numerical concepts (e.g. even versus odd numbers). Both fully-connected and convolutional architectures are investigated. The obtained results indicate that two categories of numerical properties apparently exist (in the context of discussed problems). In the first category, the properties can be learned without acquiring the counting skills, e.g. the notion of small, medium and large numbers. In the second category, explicit counting is embedded into the architecture so that the concepts are learned from numbers rather than directly from visual data. In general, we find that CNN architectures (if properly crafted) are more efficient in the discussed problems and (additionally) come with more plausible explainability.
The paper discusses a non-deterministic model for data association tasks in visual surveillance of crowds. Using detection and tracking of crowd components (i.e., individuals and groups) as baseline tools, we propose a simple algebraic framework for maintaining data association (continuity of labels assigned to crowd components) between subsequent video-frames in spite of possible disruptions and inaccuracies in tracking/detection algorithms. Formally, two alternative schemes (which, in practice, can be jointly used) are introduced, depending on whether individuals or groups can be prospectively better tracked in the current scenario. In the first scheme, only individuals are tracked, and the continuity of group labels is inferred without explicitly tracking the groups. In the second scheme, only group tracking is performed, and associations between individuals are inferred from group tracking. The associations are built upon non-deterministic estimates of memberships (individuals in groups) and estimates obtained directly from the baseline detection and tracking algorithms. The framework can incorporate any detectors and trackers (both classical or DL-based) as long as they can provide some geometric outlines (e.g., bounding boxes) of the crowd components. The formal analysis is supported by experiments in sample scenarios, where the framework provides meaningful performance improvements in various crowd analysis tasks.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Re-colorization of images or movies is a challenging problem due to the infinite RGB solutions for a monochrome object. In general, the process is assisted by humans, either by providing colorization hints or relevant training data for ML/AI algorithms. Our intention is to develop a mechanism for fully unguided (and with no training data used) colorization of movies. In other words, we aim to create acceptable colored counterparts of movies in domains where only monochrome visualizations physically exist (e.g. IR, UV, MRI, etc. data). Following our past approach to image colorization, the method assumes arbitrary rgb2gray models and utilizes a few probabilistic heuristics. Additionally, we maintain the temporal stability of colorization by locally using structural similarity (SSIM) between adjacent frames. The paper explains the details of the method, presents exemplary results and compares them to the state-of-the art solutions.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.