Czasopismo
2004
|
Vol. 59, nr 2,3
|
119--134
Tytuł artykułu
Autorzy
Wybrane pełne teksty z tego czasopisma
Warianty tytułu
Języki publikacji
Abstrakty
Large collections of genomic information have been accumulated in recent years, and embedded latently in them is potentially significant knowledge for exploitation in medicine and in the pharmaceutical industry. The approach taken here to the distillation of such knowledge is to detect strings in DNA sequences which appear frequently, either within a given sequence (eg for a particular patient) or across sequences (eg from different patients sharing a particular medical diagnosis). Motifs are strings that occur very frequently. We present basic theory and algorithms for finding very frequent and common strings. Strings which are maximally frequent are of particular interest and, having discovered such motifs we show briefly how to mine association rules by an existing rough sets based technique. Further work and applications are in progress.
Słowa kluczowe
Czasopismo
Rocznik
Tom
Strony
119--134
Opis fizyczny
Bibliogr. 5 poz.
Twórcy
autor
- College of Computer Science and Technology, Jilin University, 130012, Changchun. PR.CHINA, j.guan@qub.ac.uk
autor
- College of Computer Science and Technology, Jilin University, 130012, Changchun. PR.CHINA
autor
- School of Computer Science, The Queen’s University of Belfast, Belfast, BT7 INN, Northern Ireland, U.K., da.bell@qub.ac.uk
Bibliografia
- [1] Bell, D.A., Guan, J. W.: Computational methods for rough classification and discovery. Journal of the American Society for Information Science. Special Topic Issue on Data Mining 49/5 (1998) 403-414
- [2] Feldman, R., Aumann, Y., Amir, A., Zilberstain, A., Kloesgen. W., Ben-Yehuda, Y.: Maximal association rules: a new tool for mining for keyword co-occurrences in document collection. In Proceedings of the 3rd International Conference on Knowledge Discovery (KDD 1997) 167-170
- [3] Guan. J. W., Bell. D. A.: Rough computational methods for information systems. Artificial Intelligence — An International Journal 105 (1998) 77-104
- [4] Kiem. H., Phuc, D.: Discovering motif based association rules in a set of DNA sequences. In Ziarko, W., Yao, Y. Y. eds. Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing (RSCTC’2000) Banff, Canada, October 16-19, 2000 348-352
- [5] Srikant. R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In Proceedings of the Fifth International Conference on Extending Database Technology (EDBT). Avignon. France, March 1996: IBM Research Report RJ 9994, December 1995 (expanded version)
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS2-0005-0006