Besides clustering and classification, detection of atypical elements (outliers, rare elements) is one of the most fundamental problems in contemporary data analysis. However, contrary to clustering and classification, an atypical element detection task does not possess any natural quality (performance) index. The subject of the research presented here is the creation of one. It will enable not only evaluation of the results of a procedure for atypical element detection, but also optimization of its parameters or other quantities. The investigated quality index works particularly well with frequency types of such procedures, especially in the presence of substantial noise. Using a nonparametric approach in the design of this index practically frees the proposed method from the distribution in the dataset under examination. It may also be successfully applied to multimodal and multidimensional cases.
Extracting useful information from astronomical observations represents one of the most challenging tasks of data exploration. This is largely due to the volume of the data acquired using advanced observational tools. While other challenges typical for the class of big data problems (like data variety) are also present, the size of datasets represents the most significant obstacle in visualization and subsequent analysis. This paper studies an efficient data condensation algorithm aimed at providing its compact representation. It is based on fast nearest neighbor calculation using tree structures and parallel processing. In addition to that, the possibility of using approximate identification of neighbors, to even further improve the algorithm time performance, is also evaluated. The properties of the proposed approach, both in terms of performance and condensation quality, are experimentally assessed on astronomical datasets related to the GAIA mission. It is concluded that the introduced technique might serve as a scalable method of alleviating the problem of the dataset size.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.