Wyniki wyszukiwania - BazTech

1

Early warning systems against bankruptcy risk and NLP: can ChatGPT predict corporate distress?

Siciński Jędrzej

Zeszyty Naukowe. Organizacja i Zarządzanie / Politechnika Śląska

|

2023

|

z. 170

439--455

EN

Purpose: The main purpose of this paper is to evaluate the effectiveness and usability of one of the more groundbreaking and more widely commented NLP-technique-employing inventions, i.e., the ChatGPT application acting as a digital advisor in the field of counterparty financial standing and bankruptcy risk assessment. Design/methodology/approach: The algorithmic potential presented by the ChatGPT tool can be a valuable solution in supporting the manager's work. In this study, the current potential of this solution in supporting financial analysis, and in particular, bankruptcy risk assessment, was checked. The study was carried out using the following methods: analysis and synthesis (1.), critical analysis of the literature (2.), and an experiment involving the use of a natural language processing application (3.). Findings: In the course of the research, it was found that the ChatGPT tool, according to the current state of knowledge, has extensive usability and is able to conduct interactions that in many cases are similar to communication with a human being. The tested language model shows a much higher level of training on general data than in solving narrow problems in specific fields. Nevertheless, its development potential should be assessed highly and probably its adaptation to solve highly specialized tasks in management will not be a long-term process, which makes it a candidate for the role of a digital managerial advisor in the future. Research limitations/implications: The first stage of the research covered only solving problems with the use of the simplest algorithms, such as discriminant analysis (MDA) and the study of entities whose financial statements are widely available on the web, which was a relatively low level of complexity for the language model. Practical implications: The research results are a signal that digitization and the digital revolution are not just theoretical slogans, but real functioning technologies that can change the nature of the manager's work (and the entire management system) in the near future. The development potential of NLP technology in management, which was confirmed in this work, suggests that an appropriate strategy for implementing these technologies is needed today. Originality/value: In this study, one of the first attempts was made to assess the potential and adaptability of natural language processing systems to support a manager in assessing the financial condition and risk of bankruptcy of entities.

2

Towards mass customisation: automatic processing of orders for residential ship’s containers - A case study example

Dudek Adam, Patalas-Maliszewska Justyna, Frączak Jacek

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2023

|

Vol. 71, nr 3

art. no. e145562

EN

Along with changes in customer expectations, the process of ordering a house, especially one built with the most modern technology from prefabricated HQ 40-foot shipping containers, should take place in an atmosphere of free-flowing, customer-friendly conversation. Therefore, it is important that the company producing such a solution has a tool supporting such offers and orders when producing personalized solutions. This article provides an original approach to the automatic processing of orders based on an example of orders for residential shipping containers, natural language processing and so-called premises developed. Our solution overcomes the usage of records of the conversations between the customer and the retailer, in order to precisely predict the variant required for the house ordered, also when providing optimal house recommendations and when supporting manufacturers throughout product design and production. The newly proposed approach examines such recorded conversations in the sale of residential shipping containers and the rationale developed, and then offers the automatic placement of an order. Moreover, the practical significance of the solution, thus proposed, was emphasized thanks to verification by a real residential ship container manufacturing company in Poland.

3

When to Trust AI: Advances and Challenges for Certification of Neural Networks

Kwiatkowska Marta, Zhang Xiyue

Annals of Computer Science and Information Systems

|

2023

|

Vol. 35

25--37

EN

Artificial intelligence (AI) has been advancing at a fast pace and it is now poised for deployment in a wide range of applications, such as autonomous systems, medical diagnosis and natural language processing. Early adoption of AI technology for real-world applications has not been without problems, particularly for neural networks, which may be unstable and susceptible to adversarial examples. In the longer term, appropriate safety assurance techniques need to be developed to reduce potential harm due to avoidable system failures and ensure trustworthiness. Focusing on certification and explainability, this paper provides an overview of techniques that have been developed to ensure safety of AI decisions and discusses future challenges.

4

Siamese neural networks on the trail of similarity in bugs in 5G mobile network base stations

Zarębski Sebastian, Kuzmich Aliaksandr, Sitko Sebastian, Rusek Krzysztof, Chołda Piotr

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2022

|

nr 4

198--201

EN

To improve the R&D process, by reducing duplicated bug tickets, we used an idea of composing BERT encoder as Siamese network to create a system for finding similar existing tickets. We proposed several different methods of generating artificial ticket pairs, to augment the training set. Two phases of training were conducted. The first showed that only and approximate 9% pairs were correctly identified as certainly similar. Only 48% of the test samples are found to be pairs of similar tickets. With the fine-tuning we improved that result up to 81%, proving the concept to be viable for further improvements.

5

Examination of text's lexis using a Polish dictionary

Voitovych Roman, Łukasik Edyta

Journal of Computer Sciences Institute

|

2021

|

Vol. 21

316--323

EN

This paper presents an approach to compare and classify books written in the Polish language by comparing their lexis fields. Books can be classified by their features, such as literature type, literary genre, style, author, etc. Using a preas-sembled dictionary and Jaccard index, the authors managed to prove a lexical likeness for books. Further analysis with the PAM clustering algorithm presented a lexical connection between books of the same type or author. Analysis of values of similarities of any particular field on one side and some anomalous tendencies in other cases suggest that recognition of other features is possible. The method presented in this article allows to draw conclusions about the con-nection between any arbitrary books based solely on their vocabulary.

PL

Artykuł prezentuje metodę porównania i klasyfikacji książek napisanych w języku polskim na podstawie ich leksyki. Książki można dzielić, korzystając z ich cech, np. rodzaju literatury, gatunku literackiego, stylu, autora itp. Korzystając ze skompilowanego słownika i indeksu Jaccarda, udowodniona została hipoteza dotycząca podobieństwa książek rozpatrywanego pod kątem ich leksyki. Kolejna analiza za pomocą algorytmu klastrowego PAM wskazuje na związek leksykalny pomiędzy książkami jednego rodzaju literatury lub autora. Analiza wartości współczynników poszczególnych obszarów z jednej strony i anomalia w zachowaniu w niektórych przypadkach sugeruje, że wyodrębnienie kolejnych cech jest możliwe. Metoda przedstawiona w tym artykule pozwala wyciągać wnioski o relacjach między książkami, korzystając wyłącznie z ich słownictwa.

6

Oprogramowanie klasy machine-aided human translation jako narzędzie informatyczne umożliwiające gromadzenie i przechowywanie wiedzy o zagrożonych wymarciem językach

Handzel Zbigniew, Gajer Mirosław

Elektronika : konstrukcje, technologie, zastosowania

|

2021

|

Vol. 62, nr 2

33--35

PL

Tematyka artykułu dotyczy zagadnień związanych z opracowywaniem narzędzi informatycznych przeznaczonych dla lingwistów zainteresowanych badaniem języków zagrożonych wymarciem. Autorzy przedstawili propozycję budowy narzędzi stanowiących generatory struktur syntaktycznych oraz systemów wspomagających przekład, które mogą posłużyć jako środek do usprawnienia rekonstrukcji i rewitalizacji zagrożonych wymarciem języków.

EN

The scope of the article concerns issues related to the development of computer tools intended for linguists interested in the study of languages threatened with extinction. The authors present a proposal for the construction of tools that are generators of syntactic structures and machine-aided human translation systems, which can serve as a means of improving the reconstruction and further revitalization of endangered languages.

7

Deep learning based Tamil Parts of Speech (POS) tagger

Anbukkarasi S., Varadhaganapathy S.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2021

|

Vol. 69, nr 6

art. no. e138820

EN

This paper addresses the problem of part of speech (POS) tagging for the Tamil language, which is low resourced and agglutinative. POS tagging is the process of assigning syntactic categories for the words in a sentence. This is the preliminary step for many of the Natural Language Processing (NLP) tasks. For this work, various sequential deep learning models such as recurrent neural network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Bi-directional Long Short-Term Memory (Bi-LSTM) were used at the word level. For evaluating the model, the performance metrics such as precision, recall, F1-score and accuracy were used. Further, a tag set of 32 tags and 225 000 tagged Tamil words was utilized for training. To find the appropriate hidden state, the hidden states were varied as 4, 16, 32 and 64, and the models were trained. The experiments indicated that the increase in hidden state improves the performance of the model. Among all the combinations, Bi-LSTM with 64 hidden states displayed the best accuracy (94%). For Tamil POS tagging, this is the initial attempt to be carried out using a deep learning model.

8

Using Word Embeddings for Italian Crime News Categorization

Bonisoli Giovanni, Rollo Federica, Po Laura

Annals of Computer Science and Information Systems

|

2021

|

Vol. 25

461--470

EN

Several studies have shown that the use of embeddings improves outcomes in many NLP activities, including text categorization. In this paper, we focus on how word embeddings can be used on newspaper articles about crimes to categorize them according to the type of crime they report. Our approach was tested on an Italian dataset of 15,361 crime news articles combining different Word2Vec models and exploiting supervised and unsupervised Machine Learning categorization algorithms. The tests show very promising results.

9

Evaluation of Neural Network Transformer Models for Named-Entity Recognition on Low-Resourced Languages

Hanslo Ridewaan

Annals of Computer Science and Information Systems

|

2021

|

Vol. 25

115--119

EN

In this paper, transformer models are used to evaluate ten low-resourced South African languages for NER. Further, these transformer models are compared to bi-LSTM-aux and CRF models. The transformer models have the highest F-score of 84%. This result is significant within the context of the study, as previous research could not achieve F-scores of 80%. However, the CRF and bi-LSTM-aux models remain top performers in sequence tagging. Transformer models are viable for low-resourced languages. Future research could improve upon these findings by implementing a linear-complexity recurrent transformer variant.

10

Voice authentication based on the Russian-language dataset, MFCC method and the anomaly detection algorithm

Sidorova Anna, Kogos Konstantin

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

537--540

EN

Almost all people's data is stored on their personal devices. For this reason, there is a need to protect information from unauthorized access by means of user authentication. PIN codes, passwords, tokens can be forgotten, lost, transferred, brute-force attacked. For this reason, biometric authentication is gaining in popularity. Biometric data are unchanged for a long time, different for users, and can be measured. This paper explores voice authentication due to the ease of use of this technology, since obtaining voice characteristics of users doesn't require an equipment in addition to the microphone, which is built into almost all devices. The method of voice authentication based on an anomaly detection algorithm has been proposed. The software module for text-independent authentication has been developed on the Python language. It's based on a new Mozilla's open source voice dataset "Common voice". Experimental results confirmed the high accuracy of authentication by the proposed method.

11

Overview of the Transformer-based Models for NLP Tasks

Gillioz Anthony, Casas Jacky, Mugellini Elena, Abou Khaled Omar

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

179--183

EN

In 2017, Vaswani et al. proposed a new neural network architecture named Transformer. That modern architecture quickly revolutionized the natural language processing world. Models like GPT and BERT relying on this Transformer architecture have fully outperformed the previous state-of-the-art networks. It surpassed the earlier approaches by such a wide margin that all the recent cutting edge models seem to rely on these Transformer-based architectures. In this paper, we provide an overview and explanations of the latest models. We cover the auto-regressive models such as GPT, GPT-2 and XLNET, as well as the auto-encoder architecture such as BERT and a lot of post-BERT models like RoBERTa, ALBERT, ERNIE 1.0/2.0.

12

Open IE-Triples Inference - Corpora Development and DNN Architectures

Víta Martin, Škoda Petr

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

199--204

EN

Natural language inference (NLI) is a well established part of natural language understanding (NLU). This task is usually stated as a 3-way classification of sentence pairs with respect to entailment relation (entailment, neutral, contradiction). In this work, we focus on a derived task of relation inference: we propose a method of transforming a general NLI corpus to an annotated corpus for relation inference that utilizes existing NLI annotations. We subsequently introduce a novel relation inference corpus obtained from a well known SNLI corpus and provide its brief characterization. We investigate several DNN siamese architectures for this task and this particular corresponding corpus. We set several baselines including hypothesis only baseline. Our best architecture achieved 96.92% accuracy.

13

Named Entity Recognition and Named Entity Linking on Esports Contents

Liu Ziyu, Leng Yifan, Wang Meiqi, Lin Congzhu

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

189--192

EN

We built a named entity recognition/linking system on Esports News. We established an ontology for Esports-related entities, collected and annotated corpus from 80 articles on 4 different Esports titles, trained CRF and BERT-based entity recognizer, built a basic DOTA2 knowledge base and a Entity linker that links mentions to articles in Liquipedia, and an end-to-end web app which serves as a demo of this entire proof-of-conecpt system. Our system achieved an over 61% overall entity-level F1-score on the test set for the NER task.

14

Explorations into Deep Learning Text Architectures for Dense Image Captioning

Toshevska Martina, Stojanovska Frosina, Zdravevski Eftim, Lameski Petre, Gievska Sonja

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

129--136

EN

Image captioning is the process of generating a textual description that best fits the image scene. It is one of the most important tasks in computer vision and natural language processing and has the potential to improve many applications in robotics, assistive technologies, storytelling, medical imaging and more. This paper aims to analyse different encoder-decoder architectures for dense image caption generation while focusing on the text generation component. Already trained models for image feature generation are utilized with transfer learning. These features are used for describing the regions using three different models for text generation. We propose three deep learning architectures for generating one-sentence captions of Regions of Interest (RoIs). The proposed architectures reflect several ways of integrating features from images and text. The proposed models were evaluated and compared with several metrics for natural language generation.

15

Czech parliament meeting recordings as ASR training data

Krůza Jan Oldřich

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

185--188

EN

I present a way to leverage the stenographed recordings of the Czech parliament meetings for purposes of training a speech-to-text system. The article presents a method for scraping the data, acquiring word-level alignment and selecting reliable parts of the imprecise transcript. Finally, I present an ASR system trained on these and other data.

16

Automatic Generation of Annotated Corpora of Diagnoses with ICD-10 codes based on Open Data and Linked Open Data

Boytcheva Svetla, Velichkov Boris, Velchev Gerasim, Koychev Ivan

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

163--167

EN

We propose methods for automatic generation of corpora that contains descriptions of diagnoses in Bulgarian and their associated codes in ICD-10-CM (International Classification of Diseases, 10th revision, Clinical Modification). The proposed approach is based on the available open data and Linked Open Data and can be easily adapted for other languages. The resulted corpora generated for the Bulgarian clinical texts consists of about 370,000 pairs of diagnoses and corresponding ICD-10 codes and is beyond the usual size that can be generated manually, moreover it was created from scratch and for a relatively short time. Further updates of the corpora are also possible whenever new open resources are available or the current ones are updated.

17

The method of automatic assignment ICD codes based on semantic information

Romaldowski Marcin

Computer Science and Mathematical Modelling

|

2019

|

No. 10

17--23

EN

The paper presents the method of automatic assignment of ICD codes based on semantic information contained in clinical reports of the MIMIC-III database. It is showing the possibility of using multi-criteria optimization methods for simple classifiers fusion in a more precise classifiers complex. ICD code assignment is important in the modern hospital, more accurate automation of assigning codes will make the clinical process more efficient and can help clinicians carry out better diagnostics and effectively improve medical care systems.

PL

W artykule przedstawiono metodę automatycznego przypisywania kodów ICD-9 na podstawie informacji semantycznych zawartych w raportach klinicznych pacjentów bazy MIMIC-III. Została pokazana możliwość wykorzystania metod optymalizacji wielokryterialnej do budowy fuzji klasyfikatorów w celu utworzenia bardziej precyzyjnych klasyfikatorów. Przypisanie kodu ICD jest ważne na wielu poziomach w nowoczesnym szpitalu, dokładniejsza automatyzacja przypisywania kodów sprawi, że proces kliniczny stanie się bardziej wydajny i może pomóc klinicystom w przeprowadzeniu lepszej diagnostyki i skutecznej poprawie systemów opieki medycznej.

18

Towards semantic-rich word embeddings

Beringer Grzegorz, Jabłoński Mateusz, Januszewski Piotr, Sobecki Andrzej, Szymański Julian

Annals of Computer Science and Information Systems

|

2019

|

Vol. 18

273-–276

EN

In recent years, word embeddings have been shown to improve the performance in NLP tasks such as syntactic parsing or sentiment analysis. While useful, they are problematic in representing ambiguous words with multiple meanings, since they keep a single representation for each word in the vocabulary. Constructing separate embeddings for meanings of ambiguous words could be useful for solving the Word Sense Disambiguation (WSD) task. In this work, we present how a word embeddings average- based method can be used to produce semantic-rich meaning embeddings, and how they can be improved with distance optimization techniques. We also open-source a WSD dataset that was created for the purpose of evaluating methods presented in this research.

19

Multilingual knowledge base completion by cross-lingual semantic relation inference

Bebeshina-Clairet Nadia, Lafourcade Mathieu

Annals of Computer Science and Information Systems

|

2019

|

Vol. 18

249--253

EN

Highly structured knowledge bases such as lexical semantic networks contain various connectivity patterns that can be learned as node features using dedicated frameworks. However, semantic relations are often unequally distributed over such knowledge resources. Some of the language partitions may benefit from integrating structured resources which are more easily available for resource-rich languages. In the present paper, we propose a simple endogenous method for enhancing a multilingual knowledge base through the cross-lingual semantic relation inference. It can be run on multilingual resources prior to semantic representation learning. Multilingual knowledge bases may integrate preexisting structured resources available for resource-rich languages. We aim at performing cross-lingual inference on them to improve the low resource language by creating semantic relationships.

20

Medical prescription classification: a NLP-based approach

Carchiolo Vincenza, Longheu Alessandro, Reitano Giuseppa, Zagarella Luca

Annals of Computer Science and Information Systems

|

2019

|

Vol. 18

605--609

EN

The digitization of healthcare data has been consolidated in the last decade as a must to manage the vast amount of data generated by healthcare organizations. Carrying out this process effectively represents an enabling resource that will improve healthcare services provision, as well as on-the-edge related applications, ranging from clinical text mining to predictive modelling, survival analysis, patient similarity, genetic data analysis and many others. The application presented in this work concerns the digitization of medical prescriptions, both to provide authorization for healthcare services or to grant reimbursement for medical expenses. The proposed system first extract text from scanned medical prescription, then Natural Language Processing and machine learning techniques provide effective classification exploiting embedded terms and categories about patient/- doctor personal data, symptoms, pathology, diagnosis and suggested treatments. A REST ful Web Service is introduced, together with results of prescription classification over a set of 800K+ of diagnostic statements.