Writing a well-structured scientific documents, such as articles and theses, is vital for comprehending the document's argumentation and understanding its messages. Furthermore, it has an impact on the efficiency and time required for studying the document. Proper document segmentation also yields better results when employing automated Natural Language Processing (NLP) manipulation algorithms, including summarization and other information retrieval and analysis functions. Unfortunately, inexperienced writers, such as young researchers and graduate students, often struggle to produce well-structured professional documents. Their writing frequently exhibits improper segmentations or lacks semantically coherent segments, a phenomenon referred to as "mal-segmentation." Examples of mal-segmentation include improper paragraph or section divisions and unsmooth transitions between sentences and paragraphs. This research addresses the issue of mal-segmentation in scientific writing by introducing an automated method for detecting mal-segmentations, and utilizing Sentence Bidirectional Encoder Representations from Transformers (sBERT) as an encoding mechanism. The experimental results section shows a promising results for the detection of mal-segmentation using the sBERT technique.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Drug Named Entity Recognition (DNER) becomes indispensable for various medical relation extraction systems. Existing deep learning systems rely on the benchmark data for training as well as testing the model. However, it is very important to test on the real time data. In this research, we propose a hybrid DNER framework where we incorporate text summarization on real time data to create the test dataset. We have experimented with various text summarization techniques and found SciBERT model to give better results than other techniques.
PL
Rozpoznawanie jednostek o nazwie leku (DNER) staje się nieodzowny dla innych systemów ekstrakcji relacji medycznych. Istniejące systemy głębokiego uczenia się opierają się na danych porównawczych zarówno podczas szkolenia, jak i testowania modelu. Jednak bardzo ważne jest, aby testować dane w czasie rzeczywistym. W tym badaniu proponujemy hybrydową strukturę DNER, w której uwzględniamy podsumowanie tekstu na danych w czasie rzeczywistym w celu utworzenia zestawu danych testowych. Eksperymentowaliśmy z różnymi technikami podsumowania tekstu i stwierdziliśmy, że model BERT daje lepsze wyniki niż inne techniki.
Covid-19 has spread across the world and many different vaccines have been developed to counter its surge. To identify the correct sentiments associated with the vaccines from social media posts, we fine-tune various state-of-the-art pretrained transformer models on tweets associated with Covid-19 vaccines. Specifically, we use the recently introduced state-of-the-art RoBERTa, XLNet, and BERT pre-trained transformer models, and the domain-specific CT-BER and BERTweet transformer models that have been pre-trained on Covid-19 tweets. We further explore the option of text augmentation by oversampling using the language model-based oversampling technique (LMOTE) to improve the accuracies of these models - specifically, for small sample data sets where there is an imbalanced class distribution among the positive, negative and neutral sentiment classes. Our results summarize our findings on the suitability of text oversampling for imbalanced, small-sample data sets that are used to fine-tune state-of-the-art pre-trained transformer models as well as the utility of domain-specific transformer models for the classification task.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.