Automatic Building of a Semantic Model of Disease Symptoms Based on Text Corpus

Szostek, G.; Jaszuk, M.; Walczak, A.

Artykuł - szczegóły

Tytuł artykułu

Automatic Building of a Semantic Model of Disease Symptoms Based on Text Corpus

Autorzy

Szostek G. , Jaszuk M. , Walczak A.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Automatyczna budowa semantycznego modelu objawów chorobowych na bazie korpusu słownego

Języki publikacji

Abstrakty

The research described in the article refers to the study of data from the domain of medicine. The diagnostic test results are recorded in different ways. They may take the form of tables, graphs or images. Regardless of the original data format, it is possible to draw up their verbal description, which focuses on the description of the observed symptoms. Such descriptions make up the text corpora concerning individual diagnostic technologies. Knowledge on disease entities is stored in a similar manner. It has the form of text corpora, which contain descriptions of symptoms specific to individual diseases. By using natural language processing tools semantic models can be automatically extracted from the texts to describe particular diagnostic technologies and diseases. One of the obstacles is the fact that medical knowledge can be written in a natural language in many ways. The application of the semantic format allows the elimination of record ambiguities. Ultimately, we get a unified model of medical knowledge, both from the results of diagnostic technologies describing the state of the patient and knowledge of disease entities. This gives the possibility of merging data from different sources (heterogeneous data) to a homogeneous form. The article presents a method of generating a semantic model of medical knowledge, using lexical analysis of text corpora.

Opisane w artykule badania dotyczą danych z dziedziny medycyny. Wyniki badań diagnostycznych rejestrowane są na różne sposoby. Mogą mieć postać tabel, wykresów, obrazów. Niezależnie od oryginalnego formatu danych możliwe jest sporządzenie ich opisu słownego, który koncentruje się na opisie zaobserwowanych objawów chorobowych. Opisy takie tworzą korpusy słowne dotyczące poszczególnych technologii diagnostycznych. W podobny sposób zapisywana jest wiedza dotycząca jednostek chorobowych. Ma ona postać korpusów tekstowych, w których zawarte są opisy objawów specyficznych dla poszczególnych schorzeń. Posługując się narzędziami przetwarzania języka naturalnego, możliwe jest automatyczne wydobycie z tekstów modeli semantycznych opisujących poszczególne technologie diagnostyczne oraz choroby. Pewne utrudnienie stanowi fakt, że wiedza medyczna może zostać zapisana w języku naturalnym na wiele sposobów. Zastosowanie formatu semantycznego pozwala wyeliminować te niejednoznaczności zapisu. W konsekwencji dostajemy ujednolicony model wiedzy medycznej, zarówno od strony wyników technologii diagnostycznych opisujących stan pacjenta, jak i wiedzy dotyczącej jednostek chorobowych. Daje to możliwość dokonania fuzji danych pochodzących z różnych źródeł (danych heterogenicznych) do postaci homogenicznej. Artykuł przedstawia metodę generowania modelu semantycznego wiedzy medycznej wykorzystującą analizy leksykalne korpusów słownych.

Słowa kluczowe

semantic network ontology natural language processing

sieć semantyczna ontologia przetwarzanie języka naturalnego

Wydawca

Institute of Computer and Information Systems, Faculty of Cybernetics, Military University of Technology

Czasopismo

Biuletyn Instytutu Systemów Informatycznych

Rocznik

2014

Tom

nr 14

Strony

25--34

Opis fizyczny

Bibliogr. 10 poz., wykr.

Twórcy

autor

Szostek G.

grazyna.szostek@gmail.com

Faculty of Cybernetics, Military University of Technology, Warsaw University of Information Technology and Management in Rzeszów, Poland

autor

Jaszuk M.

Faculty of Cybernetics, Military University of Technology, Warsaw University of Information Technology and Management in Rzeszów, Poland

autor

Walczak A.

Faculty of Cybernetics, Military University of Technology, Warsaw University of Information Technology and Management in Rzeszów, Poland

Bibliografia

[1] Burgess C., “Representing and resolving semantic ambiguity: A contribution from high-dimensional memory modeling”, in Gorfein, D.S. (Ed.), On the Consequences of Meaning Selection: Perspectives on Resolving Lexical Ambiguity, APA Press. (2001).
[2] Chen H., Lynch K.J. , “Automatic construction of networks of concepts characterizing document database”, IEEE Transactions on Systems, Man and Cybernetics, Vol. 22, No. 5, 885-902, (1992).
[3] Harris Z.S., “Mathematical Structures of Language”, Interscience Publishers, John Wiley & Sons Inc., New York, 1968.
[4] Hearst M.A., “Automatic Acquisition of Hyponyms from Large Text Corpora”, Proceedings of the Fourteenth International Conference on Computational Linguistics, Nantes, (1992).
[5] Lund K., Burgess C., “Producing high-dimensional semantic spaces from lexical co-occurence”, Behavior Research Methods, Instrumentation, and Computers, 28, 203-208, (1996).
[6] Piasecki M., Derwojedowa M., Koczan P., Przepiórkowski A., Szpakowicz S., Zawisławska M., „Półautomatyczna konstrukcja Słowosieci” , URL www.plwordnet.pl/main, the web page of the project, (2007) (in Polish).
[7] Polański K. (red. ), Słownik syntaktyczno-generatywny czasowników polskich, t. 1−7, Kraków (1980–1993) (in Polish).
[8] Rohmer J., “The Case for Using Semantic Nets as a Convergence Format for Symbolic Information Fusion in NATO”, RTO-MP-IST-040 Information Systems Technology Panel (IST) symposium on “Military Data and Information Fusion”, Prague, Czech Republic, (2003).
[9] plWordNet, the web page of the project. URL: http://www.plwordnet.pwr.wroc.pl/main. (2007).
10] Velardi P., Fabriani P., Missikoff M., “Using text processing techniques to automatically enrich a domain ontology”, In : Proceedings of the International Conference on Formal Ontology in Information Systems-Volume 2001, FOIS’01, ACM, New York, NY, 270−284, 2001

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-7fffc045-63ac-4a6d-959e-480489d965a4