If a correlated pair (R(k), R0(k)) is a regular correlated pair, then the coefficient measuring the quality of such a pair satisfies the inequality: r2(k) ≥ F(k)...
A multi-disciplinary approach is indispensable for adequate acid rock drainage (ARD), mineral leaching impact, and groundwater management. Groundwater is a valuable resource, and it is critical to protect as well as mitigate the effects of pollution such as ARD in the mining environment. Mine waste storage facilities (waste rocks and tailings) are potential ARD sources capable of degrading groundwater reserves. This research investigated and reported the application of a case study of multivariate statistical and spatial variability of selected parameters associated with ARD in groundwater around WRD and TSF at mine sites. Water quality analysis data of seventy water samples from 10 boreholes located at the WRD and TSF mine were utilised in this study. The correlation matrix and principal components analysis was applied to the data set to determine the associated variability in groundwater in relation to ARD. Geostatistical analysis was used to produce contour maps to ARD principal components of the study site, using ordinary kriging of the best fit models. The application of multivariate statistical and geospatial analysis in groundwater quality assessment with coupled soil and groundwater modelling of flow and transport at waste rock dump and tailings storage sites provides an essential tool for exploratory data analysis, and spatial extent determination of the relationship between various data sets significant to acid rock drainage.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Non-small cell lung cancer (NSCLC) is the most common type of lung cancer; and is one of the leading causes of death in the world. Surgery combined with chemotherapy is the recommended treatment for NSCLC. Since chemotherapy is an expensive treatment for either medical staff or patients suffering from pain, this study attempts to construct an intelligent predictive model to predict the adjuvant chemotherapy (ACT) effectiveness/ futileness in the patients, in order to help futile cases for unnecessary applications. There is a 2-step method: preprocessing and predicting. First a purposefully preprocessing tech-nique: chi-square test, SVM-RFE and correlation matrix, were employed in NSCLC gene expression dataset as a novel multi-layered feature selection method to defeat the curse of dimension and detect the chemotherapy target genes from tens of thousands features, based on which the patients can be classified into two groups, with NB classifier at second step. 10-Fold cross-validation was found with accuracy of 68.93% for 2 genes, TGFA (205015_s_at) and SEMA6C (208100_x_at), which is preferable compared to earlier studies, even though more than 2 input features are employed for the prediction. According to the results found in this study, one can concludes that the multi-layered feature selection approach has increased the classification accuracy in terms of finding the fitted patient for receiving ACT by reducing the number of features and has significant power to be used in medical datasets with small train samples and large number of features.
The most important methods of assessing information loss caused by statistical disclosure control (SDC) are presented in the paper. The aim of SDC is to protect an individual against identification or obtaining any sensitive information relating to them by anyone unauthorised. The application of methods based either on the concealment of specific data or on their perturbation results in information loss, which affects the quality of output data, including the distributions of variables, the forms of relationships between them, or any estimations. The aim of this paper is to perform a critical analysis of the strengths and weaknesses of the particular types of methods of assessing information loss resulting from SDC. Moreover, some novel ideas on how to obtain effective and well-interpretable measures are proposed, including an innovative way of using a cyclometric function (arcus tangent) to determine the deviation of values from the original ones, as a result of SDC. Additionally, the inverse correlation matrix was applied in order to assess the influence of SDC on the strength of relationships between variables. The first presented method allows obtaining effective and well- -interpretable measures, while the other makes it possible to fully use the potential of the mutual relationships between variables (including the ones difficult to detect by means of classical statistical methods) for a better analysis of the consequences of SDC. Among other findings, the empirical verification of the utility of the suggested methods confirmed the superiority of the cyclometric function in measuring the distance between the curved deviations and the original data, and also heighlighted the need for a skilful correction of its flattening when large value arguments occur.
PL
W pracy omówiono najważniejsze metody, za pomocą których można ocenić stratę informacji spowodowaną przeprowadzaniem kontroli ujawniania danych (ang. statistical disclosure control, SDC). Kontrola ta ma na celu ochronę przed identyfikacją jednostki i dotarciem do dotyczących jej wrażliwych informacji przez osoby nieupoważnione. Zastosowanie metod zarówno opartych na ukrywaniu określonych danych, jak i prowadzących do ich zniekształcania powoduje stratę informacji, która ma wpływ na jakość danych wynikowych, w tym rozkładów zmiennych, kształt ich związków oraz estymacji. Celem artykułu jest krytyczna analiza mocnych i słabych stron metod oceny straty informacji na skutek zastosowania SDC. Przedstawiono również nowatorskie propozycje prowadzące do uzyskania efektywnych i dobrze interpretowalnych mierników, m.in. nową możliwość wykorzystania funkcji cyklometrycznej (arcus tangens) do wyznaczenia odchylenia wartości od tych oryginalnych po przeprowadzeniu SDC. Ponadto zastosowano odwróconą macierz korelacji do oceny wpływu SDC na siłę związków między zmiennymi. Pierwsza z przedstawionych metod umożliwia uzyskanie efektywnych i dobrze interpretowalnych mierników, druga – maksymalne wykorzystanie wzajemnych powiązań między zmiennymi (także tych trudno uchwytnych za pomocą klasycznych metod statystycznych) w celu lepszej analizy skutków kontroli w tym zakresie. Empiryczna weryfikacja użyteczności sugerowanych metod potwierdziła m.in. przewagę funkcji cyklometrycznej w pomiarze odległości w zakresie uwypuklania odchyleń od danych oryginalnych, a także potrzebę umiejętnej korekcji jej spłaszczenia przy dużej wartości argumentów.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.