Application of resampling methods to estimate the number of clus­ters in gene expression data

Stąpor, K.

Artykuł - szczegóły

Tytuł artykułu

Application of resampling methods to estimate the number of clusters in gene expression data

Autorzy

Stąpor K.

Wybrane pełne teksty z tego czasopisma

http://journals.pan.pl/dlibra/journal/104397

Identyfikatory

Warianty tytułu

Zastosowanie metod próbkujących do oszacowanie liczby grup w danych stanowiących poziomy ekspresji genów

Języki publikacji

Abstrakty

In this paper we evaluated the two recently emerged resampling-based methods for estimation the number of clusters (if any) in a dataset. The first method is based on the concept of clustering stability while the second utilizes the ideas from discriminant analysis. These methods are compared using simulated and gene expression data from cancer microarray studies.

W artykule przedstawiono dwie nowe metody dotyczące walidacji grupowania danych, a konkretnie oszacowania liczby grup. Obie metody bazują na odpowiednim próbkowaniu zbioru wejściowego a ich podstawową ideą jest stwierdzenie, że stabilna struktura to taka, która jest odporna" na perturbacje danych. Pierwsza metoda, pochodząca z [1] bazuje na pojęciu tzw. stabilności wyniku grupowania, które z kolei jest definiowane w oparciu o odpowiednio skonstruowaną macierz niezgodności. W drugiej metodzie wykorzystywane są pojęcia z analizy dyskryminacyjnej dla oceny stabilności uzyskanej w wyniku grupowania struktury. Obie metody zostały porównane z użyciem specjalnie wygenerowanych zbiorów testowych. Następnie zastosowano je dla oszacowania liczby grup w danych stanowiących poziomy ekspresji genów pacjentów zdrowych i chorych na różne rodzaje białaczki, pochodzące z mikromacierzy DNA.

Słowa kluczowe

clustering cluster Yalidation gene expression data

Wydawca

Instytut Informatyki Teoretycznej i Stosowanej Polskiej Akademii Nauk

Czasopismo

Theoretical and Applied Informatics

Rocznik

2006

Tom

Vol. 18, nr 2

Strony

109--122

Opis fizyczny

Bibliogr. 6 poz., rys.

Twórcy

autor

Stąpor K.

Silesian Technical University Institute of Informatics ul. Akademicka 16 44-100 Gliwice, Poland

Bibliografia

[1] Ben-Hur A., Guyon I.: Detecting stable clusters using principal component analysis. In Methods in Molecular Biology, M.J. Brownstein and A. Kohodursky (eds.), Humana Press,159-182, 2003.
[2] Fridlyand J., Dudoit S.: Application of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Technical report, nr 600, University of California, Berkeley.
[3] Jain A., Dubes R.: Algorithms for clustering data. Prentice Hall, Englewood Cliffs, NJ, 1988.
[4] Alizadeh A.A. et al: Distinct types of diffuse large B-cel lymphoma identified by gene expression profiling. Nature, 403, 503-511, 2000.
[5] Golub T.R. et al: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531-537, 1999.
[6] Fowlkes E., Mallows C.: A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78, 553-584, 1983.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUJ5-0005-0031

Application of resampling methods to estimate the number of clus­ters in gene expression data

Application of resampling methods to estimate the number of clusters in gene expression data