Mining the Largest Dense Vertexlet in a Weighted Scale-free Graph

Bandyopadhyay, S.; Bhattacharyya, M.

Artykuł - szczegóły

Tytuł artykułu

Mining the Largest Dense Vertexlet in a Weighted Scale-free Graph

Autorzy

Bandyopadhyay S. , Bhattacharyya M.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

An important problem of knowledge discovery that has recently evolved in various reallife networks is identifying the largest set of vertices that are functionally associated. The topology of many real-life networks shows scale-freeness, where the vertices of the underlying graph follow a power-law degree distribution. Moreover, the graphs corresponding to most of the real-life networks are weighted in nature. In this article, the problem of finding the largest group or association of vertices that are dense (denoted as dense vertexlet) in a weighted scale-free graph is addressed. Density quantifies the degree of similarity within a group of vertices in a graph. The density of a vertexlet is defined in a novel way that ensures significant participation of all the vertices within the vertexlet. It is established that the problem is NP-complete in nature. An upper bound on the order of the largest dense vertexlet of a weighted graph, with respect to certain density threshold value, is also derived. Finally, an O(n2 log n) (n denotes the number of vertices in the graph) heuristic graph mining algorithm that produces an approximate solution for the problem is presented.

Słowa kluczowe

graphs and networks scale-free mining methods and algorithms knowledge discovery bioinformatics

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2009

Tom

Vol. 96, nr 1/2

Strony

1--25

Opis fizyczny

Bibliogr. 29 poz., tab., wykr.

Twórcy

autor

Bandyopadhyay S.

autor

Bhattacharyya M.

Machine Intelligence Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata - 700108, India, sanghami@isical.ac.in

Bibliografia

[1] Cliques, Coloring, and Satisfiability, in: Second DIMACS Implementation Challenge, DIMACS Series in Discrete Mathematics and Theoretical Computer Science (D. S. Johnson, M. A. Trick, Eds.), vol. 26, American Mathematical Society, Providence, RI, 1996.
[2] Al-Shahrour, F., Daz-Uriarte, R., Dopazo, J.: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes, Bioinformatics, 20, 2004, 578-580.
[3] Barabási, A. L., Albert, R.: Emergence of Scaling in Random Networks, Science, 286, 1999, 509-512.
[4] Barabási, A. L., Albert, R., Jeong, H.: Mean-field theory for scale-free random networks, Physica A, 272, 1999, 173-187.
[5] Bomze, I. M., Budinich, M., Pardalos, P. M., Pelillo, M.: The Maximum Clique Problem, in: Handbook of Combinatorial Optimization: Supplementary Volume A (D. Z. Du, P. Pardalos, Eds.), Kluwer Academic, Dordrecht, 1999, 1-74.
[6] Brock, G. N., Shaffer, J. R., Blakesley, R. E., Lotz, M. J., Tseng, G. C.: Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes, BMC Bioinformatics, 9, 2008.
[7] Östegard, P. R. J.: A fast algorithm for the maximum clique problem, Discrete Applied Mathematics, 120, 2002, 197-207.
[8] Du, D. Z., Ko, K. I.: Theory of Computational Complexity, John Wiley & Sons, Inc., New York, 2000.
[9] Eisen, M. B., Spellman, P. T., Brown, P. O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns, Proceedings of National Academy of Sciences, 95, 1998, 14863-14868.
[10] Fahle, T.: Simple and fast: Improving a branch-and-bound algorithm for maximum clique, Proceedings of the 10th Annual European Symposium on Algorithms, Kinsale, Ireland, 2002.
[11] Garey, M. R., Johnson, D. S.: Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman & Co., New York, 1979.
[12] Geng, X., Xu, J., Xiao, J., Pan, L.: A simple simulated annealing algorithm for the maximum clique problem, Information Sciences, 177, 2007, 5064-5071.
[13] Han, J., Kamber, M.: Data Mining: Concepts and Techniques, Academic Press, San Francisco, 2001.
[14] Håstad, J.: Clique is hard to approximate within n1-e, Acta Mathematica, 182, 1999, 105-142.
[15] Hu, H., Yan, X., Huang, Y., Han, J., Zhou, X. J.: Mining coherent dense subgraphs across massive biological networks for functional discovery, Bioinformatics, 21, 2005, i213-i221.
[16] Iyer, V. R., Eisen, M. B., Ross, D. T., Schuler, G., Moore, T., Lee, J. C. F., Trent, J. M., Staudt, L. M., Jr., J. H., Boguski, M. S., Lashkari, D., Shalon, D., Botstein, D., Brown, P. O.: The Transcriptional Program in the Response of Human Fibroblasts to Serum, Science, 283, 1999, 83-87.
[17] Katayama, K., Hamamoto, A., Narihisa, H.: An effective local search for the maximum clique problem, Information Processing Letters, 95, 2005, 503-511.
[18] Kaufman, L., Rousseeuw, P. J.: Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley, New York, 1990.
[19] Lee, I., Date, S. V., Adai, A. T.,Marcotte, E.M.: A Probabilistic FunctionalNetwork of Yeast Genes, Science, 306, 2004, 1555-1558.
[20] Marchiori, E.: Genetic, iterated and multistart local search for the maximumclique problem, in: Applications of Evolutionary Computing, vol. LNCS 2279, Springer, Berlin, 2002, 112-121.
[21] Mendes, P., Sha,W., Ye, K.: Artificial gene networks for objective comparison of analysis algorithms, Bioinformatics, 19, 2003, ii122-ii129.
[22] Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data, Bioinformatics, 19, 2003, 2088-2096.
[23] Qin, Z. S.: Clustering microarray gene expression data using weighted Chinese restaurant process, Bioinformatics, 22, 2006, 1988-1997.
[24] Régin, J. C.: Solving the Maximum Clique Problem with Constraint Programming, Proceedings of the CPAIOR'03, Kinsale, Ireland, 2003.
[25] Wang, J., Tang, Z., Wang, R.: Maximum neural network with nonlinear self-feedback for maximum clique problem, Neurocomputing, 57, 2004, 485-492.
[26] Wood, D.: An algorithm for finding maximum clique in a graph, Operations Research Letters,, 21, 1997, 211-217.
[27] Xu, R.: Survey of Clustering Algorithms, IEEE Transactions on Neural Networks, 16, 2005, 645-678.
[28] Xu, X., Ma, J., Lei, J.: An Improved Ant Colony Optimization for the Maximum Clique Problem, Proceedings of the Third International Conference on Natural Computation, 4, Hainan, China, 2007.
[29] Yan, X., Mehan, M. R., Huang, Y., Waterman, M. S., Yu, P. S., Zhou, X. J.: A graph-based approach to systematically reconstruct human transcriptional regulatory modules, Bioinformatics, 23, 2007, i577-i586.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS8-0008-0038