In the paper the idea of the multilevel correction of the results handwriting OCR of medical texts is investigated. The correction is performed according to different levels of linguistic knowledge. Three types of models, namely: the n-gram Language Models of word form and base form sequences, the morpho-syntactic model based on a tagger and the model of correction by parsing are presented and their results are compared. The parsing model is based on the combination of a deterministic Czech parser adapted for Polish and the Structured Language Model based on lexicalised, binary parsing trees produced in the left-to-right manner. Contrary to the initial expectations, the best result of correction from 82% of the word level classifier to 92.98% of the overall accuracy was achieved with the help of a n-gram Language Models. The more rich description of language expressions in a model, the worse results were obtained. This result is in large extent caused by the specific characteristics of the processed medical documents.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Short text classification is an important task widely used in many applications. However, few works investigated applying Spiking Neural Networks (SNNs) for text classification. To the best of our knowledge, there were no attempts to apply SNNs as classifiers of short texts. In this paper, we offer a comparative study of short text classification using SNNs. To this end, we selected and evaluated three popular implementations of SNNs: evolving Spiking Neural Networks (eSNN), the NeuCube implementation of SNNs, as well as the SNNTorch implementation that is available as the Python language package. In order to test the selected classifiers, we selected and preprocessed three publicly available datasets: 20-newsgroup dataset as well as imbalanced and balanced PubMed datasets of medical publications. The preprocessed 20-newsgroup dataset consists of first 100 words of each text, while for the classification of PubMed datasets we use only a title of each publication. As a text representation of documents, we applied the TF-IDF encoding. In this work, we also offered a new encoding method for eSNN networks, that can effectively encode values of input features having non-uniform distributions. The designed method works especially effectively with the TF-IDF encoding. The results of our study suggest that SNN networks may provide the classification quality is some cases matching or outperforming other types of classifiers.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.