Wykorzystanie języka R do statystycznej analizy oraz analizy skupień dla danych geochemicznych

Janiga, Marek

doi:10.18668/NG.2023.09.02

Artykuł - szczegóły

Tytuł artykułu

Wykorzystanie języka R do statystycznej analizy oraz analizy skupień dla danych geochemicznych

Autorzy

Janiga Marek

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.18668/NG.2023.09.02

Warianty tytułu

Use of R programming language for statistical analysis and cluster analysis of geochemical data

Języki publikacji

Abstrakty

W zagadnieniach geologii naftowej metody statystyczne są szeroko stosowane w petrografii, petrofizyce, geochemii, geomechanice, geofizyce wiertniczej czy sejsmice, a analiza skupień jest istotna w klasyfikacji skał – wyznaczaniu stref o pewnych własnościach, np. macierzystych lub zbiornikowych. Artykuł prezentuje użycie metod statystycznych, w tym metod analizy skupień, w procesach przetwarzania i analizy dużych zbiorów różnorodnych danych geochemicznych. Do analiz statystycznych wykorzystano literaturowe dane z analiz składu chemicznego i izotopowego gazów ziemnych. Wyniki zawierały skład chemiczny gazów ziemnych oraz skład izotopowy. Zastosowano algorytmy tzw. nienadzorowanego uczenia maszynowego do przeprowadzenia analizy skupień. Grupowania było przeprowadzone dwiema metodami: k-średnich oraz hierarchiczną. Do zobrazowania wyników grupowania metodą k-średnich można wykorzystać dwuwymiarowy wykres (funkcja fviz_cluster języka R). Wymiary na wykresie to efekt analizy głównych składowych (PCA) i są one liniową kombinacją cech (kolumn w tabeli). Wynikiem grupowania metodą hierarchiczną jest wykres nazywany dendrogramem. W artykule dodatkowo zaprezentowano wykresy pudełkowe i histogramy oraz macierz korelacji zawierającą współczynniki korelacji Pearsona. Wszystkie prace wykonano z użyciem języka programowania R. Język R, z wykorzystaniem programu RStudio, jest bardzo wygodnym i szybkim narzędziem do statystycznej analizy danych. Przy użyciu tego języka uzyskanie wymienionych powyżej wykresów, tabeli i danych jest szybkie i stosunkowo łatwe. Wyniki analiz składu gazu wydają się mało zróżnicowane. Mimo to dzięki algorytmom k-średnich i hierarchicznym możliwe było pogrupowanie danych geochemicznych na wyraźnie rozdzielne zespoły. Zarówno wartości składu izotopowego, jak i skład chemiczny pozwalają wyznaczyć grupy, które w inny sposób nie byłyby dostrzegalne.

In petroleum geology, statistical methods are widely used in petrography, petrophysics, geochemistry, geomechanics, well log analysis and seismics, and cluster analysis is important for rock classification – determination of zones with certain properties, e.g., source or reservoir. This paper presents the use of the R language for statistical analysis, including cluster analysis, of large sets of diverse geochemical data. Literature data from analyses of chemical and isotopic composition of natural gases were used for statistical analyses. The results included the chemical composition of the natural gases and the isotopic composition. So-called unsupervised machine learning algorithms were used to perform the cluster analysis. Clustering was performed using two methods: k-means and hierarchical. A two-dimensional graph (function fviz_cluster) can be used to illustrate the results of the k-means clustering. The dimensions in the graph are the result of principal component analysis (PCA) and are a linear combination of the features (columns in the table). The result of hierarchical clustering is a graph called a dendrogram. The paper additionally presents box plots and histograms as well as a correlation matrix containing Pearson correlation coefficients. All work was completed using the programming language R. The R language, using the RStudio software, is a very convenient and fast tool for statistical data analysis. Obtaining the above-mentioned graphs, tables and data is quick and relatively easy, using the R language. The results of the analyses of the composition of the gas appear to have little variation. Nevertheless, thanks to k-means and hierarchical algorithms, it was possible to group the geochemical data into clearly separable groups. Both the isotopic composition values and the chemical composition make it possible to delineate groups that would not otherwise be noticeable.

Słowa kluczowe

analiza skupień metoda k-średnich metoda hierarchiczna skład gazu ziemnego

cluster analysis k-means method hierarchical method natural gas composition

Wydawca

Instytut Nafty i Gazu - Państwowy Instytut Badawczy

Czasopismo

Nafta-Gaz

Rocznik

2023

Tom

R. 79, nr 9

Strony

576--583

Opis fizyczny

Bibliogr. 17 poz., rys.

Twórcy

autor

Janiga Marek

marek.janiga@inig.pl

Instytut Nafty i Gazu – Państwowy Instytut Badawczy

Bibliografia

Bruce P., Bruce A., Gedeck P., 2021. Statystyka praktyczna w data science. Helion S.A.
Duong T., 2001. An introduction to kernel density estimation. Weatherburn Lecture Series, Department of Mathematics and Statistics, University of Western Australia, 5(24).
Gordon A.D., 1999. Classification. Second Edition. Chapman and Hall/CRC, London.
Hartigan J.A., Wong M.A., 1979. Algorithm AS 136: A K-means clustering algorithm. Applied Statistics, 28: 100–108. DOI:10.2307/2346830.
Kotarba M., 1992. Bacterial gases in Polish part of the Carpathian Foredeep and the Flysch Carpathians: isotopic and geological approach. [W:] Vially R. (ed.). Bacterial Gas. Editions Technip, Paris, 133–146.
Kotarba M.J., 1998. Composition and origin gaseous hydrocarbons in the Miocene strata of the Polish part of the Carpathian Foredeep. Przegląd Geologiczny, 46: 751–758.
Kotarba M.J., 2011. Origin of natural gases in the autochthonous Miocene strata of the Polish Carpathian Foredeep. Annales Societatis Geologorum Poloniae, 81: 409–424.
Kotarba M., Jawor E., 1993. Petroleum generation, migration and accumulation in the Miocene sediments and Paleozoic–Mesozoic basement complex of the Carpathian Foredeep between Cracow and Pilzno (Poland). [W:] Spencer A.M. (ed.). Generation, accumulation and production of Europe’s hydrocarbons. Special Publication of the European Association of Petroleum Geologists, 3, Springer, Heidelberg, 295–301.
Kotarba M.J., Nagao K., 2008. Composition and origin of natural gases accumulated in the Polish and Ukrainian parts of the Carpathian region: Gaseous hydrocarbons, noble gases, carbon dioxide and nitrogen. Chemical Geology, 255: 426–438.
Kotarba M.J., Więcław D., Kosakowski P., Kowalski A., 2005. Hydrocarbon potential of source rocks and origin of natural gases accumulated in Miocene strata of the Carpathian Foredeep in Rzeszów area. Przegląd Geologiczny, 53: 67–76.
Kwilosz T., Filar B., Miziołek M., 2022. Use of Cluster Analysis to Group Organic Shale Gas Rocks by Hydrocarbon Generation Zones. Energies, 15(4): 1464. DOI: 10.3390/en15041464.
Murtagh F., 1985. Multidimensional Clustering Algorithms. COMPSTAT Lectures 4. Physica-Verlag, Würzburg.
Murtagh F., Legendre P., 2014. Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? Journal of Classification, 31: 274–295. DOI: 10.1007/s00357-014-9161-z.
Topór T., 2020. An integrated workflow for MICP-based rock typing: A case study of a tight-gas sandstone reservoir in the Baltic Basin (Poland). Nafta-Gaz, 76(4): 219–229. DOI: 10.18668/NG.2020.04.01.
Topór T., 2021. Application of machine learning algorithms to predict permeability in tight sandstone formations. Nafta-Gaz, 77(5):283–292. DOI: 10.18668/NG.2021.05.01.
Tukey J., 1962. The Future of Data Analysis. The Annals of Mathematical Statistics, 33(1): 1–67.
Tukey J., 1977. Exploratory data analysis. Addison Wesley

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-363ee88f-6a02-48b5-be38-585e91a2dd00