Construction of a medical corpus based on information extraction results

Autorzy Marciniak, M.  Mykowiecka, A. 
Języki publikacji EN
EN The paper presents a method of automatic construction of a semantically annotated corpus using the results of a rulebased information extraction (IE) application. Construction of the corpus is based on using existing programs for text tokenization and morphological analysis and combining their results with domain related correction rules. We reuse the specialized IE system to obtain a corpus annotated on the semantic level. The texts included within the corpus are Polish free text clinical data. We present the documents - diabetic patients' discharge records, the structure of the corpus annotation and the methods for obtaining the annotations. Initial evaluations based on the results of manual verification of selected data subset are also presented. The corpus, once manually corrected, is designed to be used for developing supervised machine learning models for IE applications.
EN corpus   semantic annotation   clinical data   information extraction  
Wydawca Systems Research Institute, Polish Academy of Sciences
Czasopismo Control and Cybernetics
Rocznik 2011
Tom Vol. 40, no 2
Strony 337--360
