Wyniki wyszukiwania - BazTech

1

Early warning systems against bankruptcy risk and NLP: can ChatGPT predict corporate distress?

Siciński Jędrzej

Zeszyty Naukowe. Organizacja i Zarządzanie / Politechnika Śląska

|

2023

|

z. 170

439--455

EN

Purpose: The main purpose of this paper is to evaluate the effectiveness and usability of one of the more groundbreaking and more widely commented NLP-technique-employing inventions, i.e., the ChatGPT application acting as a digital advisor in the field of counterparty financial standing and bankruptcy risk assessment. Design/methodology/approach: The algorithmic potential presented by the ChatGPT tool can be a valuable solution in supporting the manager's work. In this study, the current potential of this solution in supporting financial analysis, and in particular, bankruptcy risk assessment, was checked. The study was carried out using the following methods: analysis and synthesis (1.), critical analysis of the literature (2.), and an experiment involving the use of a natural language processing application (3.). Findings: In the course of the research, it was found that the ChatGPT tool, according to the current state of knowledge, has extensive usability and is able to conduct interactions that in many cases are similar to communication with a human being. The tested language model shows a much higher level of training on general data than in solving narrow problems in specific fields. Nevertheless, its development potential should be assessed highly and probably its adaptation to solve highly specialized tasks in management will not be a long-term process, which makes it a candidate for the role of a digital managerial advisor in the future. Research limitations/implications: The first stage of the research covered only solving problems with the use of the simplest algorithms, such as discriminant analysis (MDA) and the study of entities whose financial statements are widely available on the web, which was a relatively low level of complexity for the language model. Practical implications: The research results are a signal that digitization and the digital revolution are not just theoretical slogans, but real functioning technologies that can change the nature of the manager's work (and the entire management system) in the near future. The development potential of NLP technology in management, which was confirmed in this work, suggests that an appropriate strategy for implementing these technologies is needed today. Originality/value: In this study, one of the first attempts was made to assess the potential and adaptability of natural language processing systems to support a manager in assessing the financial condition and risk of bankruptcy of entities.

2

Adaptive Rider Feedback Artificial Tree Optimization-Based Deep Neuro-Fuzzy Network for Classification of Sentiment Grade

Jasti Sireesha, Kumar G.V.S. Raj

Journal of Telecommunications and Information Technology

|

2023

|

nr 1

37--50

EN

Sentiment analysis is an efficient technique for expressing users’ opinions (neutral, negative or positive) regarding specific services or products. One of the important benefits of analyzing sentiment is in appraising the comments that users provide or service providers or services. In this work, a solution known as adaptive rider feedback artificial tree optimization-based deep neuro-fuzzy network (RFATO-based DNFN) is implemented for efficient sentiment grade classification. Here, the input is pre-processed by employing the process of stemming and stop word removal. Then, important factors, e.g. SentiWordNet-based features, such as the mean value, variance, as well as kurtosis, spam word-based features, term frequency-inverse document frequency (TF-IDF) features and emoticon-based features, are extracted. In addition, angular similarity and the decision tree model are employed for grouping the reviewed data into specific sets. Next, the deep neuro-fuzzy network (DNFN) classifier is used to classify the sentiment grade. The proposed adaptive rider feedback artificial tree optimization (A-RFATO) approach is utilized for the training of DNFN. The A-RFATO technique is a combination of the feedback artificial tree (FAT) approach and the rider optimization algorithm (ROA) with an adaptive concept. The effectiveness of the proposed A-RFATO-based DNFN model is evaluated based on such metrics as sensitivity, accuracy, specificity, and precision. The sentiment grade classification method developed achieves better sensitivity, accuracy, specificity, and precision rates when compared with existing approaches based on Large Movie Review Dataset, Datafiniti Product Database, and Amazon reviews.

3

Towards mass customisation: automatic processing of orders for residential ship’s containers - A case study example

Dudek Adam, Patalas-Maliszewska Justyna, Frączak Jacek

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2023

|

Vol. 71, nr 3

art. no. e145562

EN

Along with changes in customer expectations, the process of ordering a house, especially one built with the most modern technology from prefabricated HQ 40-foot shipping containers, should take place in an atmosphere of free-flowing, customer-friendly conversation. Therefore, it is important that the company producing such a solution has a tool supporting such offers and orders when producing personalized solutions. This article provides an original approach to the automatic processing of orders based on an example of orders for residential shipping containers, natural language processing and so-called premises developed. Our solution overcomes the usage of records of the conversations between the customer and the retailer, in order to precisely predict the variant required for the house ordered, also when providing optimal house recommendations and when supporting manufacturers throughout product design and production. The newly proposed approach examines such recorded conversations in the sale of residential shipping containers and the rationale developed, and then offers the automatic placement of an order. Moreover, the practical significance of the solution, thus proposed, was emphasized thanks to verification by a real residential ship container manufacturing company in Poland.

4

When to Trust AI: Advances and Challenges for Certification of Neural Networks

Kwiatkowska Marta, Zhang Xiyue

Annals of Computer Science and Information Systems

|

2023

|

Vol. 35

25--37

EN

Artificial intelligence (AI) has been advancing at a fast pace and it is now poised for deployment in a wide range of applications, such as autonomous systems, medical diagnosis and natural language processing. Early adoption of AI technology for real-world applications has not been without problems, particularly for neural networks, which may be unstable and susceptible to adversarial examples. In the longer term, appropriate safety assurance techniques need to be developed to reduce potential harm due to avoidable system failures and ensure trustworthiness. Focusing on certification and explainability, this paper provides an overview of techniques that have been developed to ensure safety of AI decisions and discusses future challenges.

5

Siamese neural networks on the trail of similarity in bugs in 5G mobile network base stations

Zarębski Sebastian, Kuzmich Aliaksandr, Sitko Sebastian, Rusek Krzysztof, Chołda Piotr

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2022

|

nr 4

198--201

EN

To improve the R&D process, by reducing duplicated bug tickets, we used an idea of composing BERT encoder as Siamese network to create a system for finding similar existing tickets. We proposed several different methods of generating artificial ticket pairs, to augment the training set. Two phases of training were conducted. The first showed that only and approximate 9% pairs were correctly identified as certainly similar. Only 48% of the test samples are found to be pairs of similar tickets. With the fine-tuning we improved that result up to 81%, proving the concept to be viable for further improvements.

6

islEHR, a model for electronic health records interoperability

Najjar Arwa, Amro Belal, Macedo Mário

Bio-Algorithms and Med-Systems

|

2022

|

Vol. 18, no. 1

39--54

EN

Objectives: Due to the diversity, volume, and distribution of ingested data, the majority of current healthcare entities operate independently, increasing the problem of data processing and interchange. The goal of this research is to design, implement, and evaluate an electronic health record (EHR) interoperability solution - prototype - among healthcare organizations, whether these organizations do not have systems that are prepared for data sharing, or organizations that have such systems. Methods: We established an EHR interoperability prototype model named interoperability smart lane for electronic health record (islEHR), which comprises of three modules: 1) a data fetching APIs for external sharing of patients’ information from participant hospitals; 2) a data integration service, which is the heart of the islEHR that is responsible for extracting, standardizing, and normalizing EHRs data leveraging the fast healthcare interoperability resources (FHIR) and artificial intelligence techniques; 3) a RESTful API that represents the gateway sits between clients and the data integration services. Results: The prototype of the islEHR was evaluated on a set of unstructured discharge reports. The performance achieved a total time of execution ranging from 0.04 to 84.49 s. While the accuracy reached an F-Score ranging from 1.0 to 0.89. Conclusions: According to the results achieved, the islEHR prototype can be implemented among different heterogeneous systems regardless of their ability to share data. The prototype was built based on international standards and machine learning techniques that are adopted worldwide. Performance and correctness results showed that islEHR outperforms existing models in its diversity as well as correctness and performance.

7

Ontology Extraction from Software Requirements Using Named-Entity Recognition

Kocerka Jerzy, Krześlak Michał, Gałuszka Adam

Advances in Science and Technology. Research Journal

|

2022

|

Vol. 16, no 3

207--212

EN

With the software playing a key role in most of the modern, complex systems it is extremely important to create and keep the software requirements precise and non-ambiguous. One of the key elements to achieve such a goal is to define the terms used in a requirement in a precise way. The aim of this study is to verify if the commercially available tools for natural language processing (NLP) can be used to create an automated process to identify whether the term used in a requirement is linked with a proper definition. We found out, that with a relatively small effort it is possible to create a model that detects the domain specific terms in the software requirements with a precision of 87 %. Using such model it is possible to determine if the term is followed by a link to a definition.

8

Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish

Tulu Cagatay Neftali

Advances in Science and Technology. Research Journal

|

2022

|

Vol. 16, no 4

147--156

EN

This study aims to evaluate experimentally the word vectors produced by three widely used embedding methods for the word-level semantic text similarity in Turkish. Three benchmark datasets SimTurk, AnlamVer, and RG65_Turkce are used in this study to evaluate the word embedding vectors produced by three different methods namely Word2Vec, Glove, and FastText. As a result of the comparative analysis, Turkish word vectors produced with Glove and FastText gained better correlation in the word level semantic similarity. It is also found that The Turkish word coverage of FastText is ahead of the other two methods because the limited number of Out of Vocabulary (OOV) words have been observed in the experiments conducted for FastText. Another observation is that FastText and Glove vectors showed great success in terms of Spearman correlation value in the SimTurk and AnlamVer datasets both of which are purely prepared and evaluated by local Turkish individuals. This is another indicator showing that these aforementioned datasets are better representing the Turkish language in terms of morphology and inflections.

9

Examination of text's lexis using a Polish dictionary

Voitovych Roman, Łukasik Edyta

Journal of Computer Sciences Institute

|

2021

|

Vol. 21

316--323

EN

This paper presents an approach to compare and classify books written in the Polish language by comparing their lexis fields. Books can be classified by their features, such as literature type, literary genre, style, author, etc. Using a preas-sembled dictionary and Jaccard index, the authors managed to prove a lexical likeness for books. Further analysis with the PAM clustering algorithm presented a lexical connection between books of the same type or author. Analysis of values of similarities of any particular field on one side and some anomalous tendencies in other cases suggest that recognition of other features is possible. The method presented in this article allows to draw conclusions about the con-nection between any arbitrary books based solely on their vocabulary.

PL

Artykuł prezentuje metodę porównania i klasyfikacji książek napisanych w języku polskim na podstawie ich leksyki. Książki można dzielić, korzystając z ich cech, np. rodzaju literatury, gatunku literackiego, stylu, autora itp. Korzystając ze skompilowanego słownika i indeksu Jaccarda, udowodniona została hipoteza dotycząca podobieństwa książek rozpatrywanego pod kątem ich leksyki. Kolejna analiza za pomocą algorytmu klastrowego PAM wskazuje na związek leksykalny pomiędzy książkami jednego rodzaju literatury lub autora. Analiza wartości współczynników poszczególnych obszarów z jednej strony i anomalia w zachowaniu w niektórych przypadkach sugeruje, że wyodrębnienie kolejnych cech jest możliwe. Metoda przedstawiona w tym artykule pozwala wyciągać wnioski o relacjach między książkami, korzystając wyłącznie z ich słownictwa.

10

Oprogramowanie klasy machine-aided human translation jako narzędzie informatyczne umożliwiające gromadzenie i przechowywanie wiedzy o zagrożonych wymarciem językach

Handzel Zbigniew, Gajer Mirosław

Elektronika : konstrukcje, technologie, zastosowania

|

2021

|

Vol. 62, nr 2

33--35

PL

Tematyka artykułu dotyczy zagadnień związanych z opracowywaniem narzędzi informatycznych przeznaczonych dla lingwistów zainteresowanych badaniem języków zagrożonych wymarciem. Autorzy przedstawili propozycję budowy narzędzi stanowiących generatory struktur syntaktycznych oraz systemów wspomagających przekład, które mogą posłużyć jako środek do usprawnienia rekonstrukcji i rewitalizacji zagrożonych wymarciem języków.

EN

The scope of the article concerns issues related to the development of computer tools intended for linguists interested in the study of languages threatened with extinction. The authors present a proposal for the construction of tools that are generators of syntactic structures and machine-aided human translation systems, which can serve as a means of improving the reconstruction and further revitalization of endangered languages.

11

Deep learning based Tamil Parts of Speech (POS) tagger

Anbukkarasi S., Varadhaganapathy S.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2021

|

Vol. 69, nr 6

art. no. e138820

EN

This paper addresses the problem of part of speech (POS) tagging for the Tamil language, which is low resourced and agglutinative. POS tagging is the process of assigning syntactic categories for the words in a sentence. This is the preliminary step for many of the Natural Language Processing (NLP) tasks. For this work, various sequential deep learning models such as recurrent neural network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Bi-directional Long Short-Term Memory (Bi-LSTM) were used at the word level. For evaluating the model, the performance metrics such as precision, recall, F1-score and accuracy were used. Further, a tag set of 32 tags and 225 000 tagged Tamil words was utilized for training. To find the appropriate hidden state, the hidden states were varied as 4, 16, 32 and 64, and the models were trained. The experiments indicated that the increase in hidden state improves the performance of the model. Among all the combinations, Bi-LSTM with 64 hidden states displayed the best accuracy (94%). For Tamil POS tagging, this is the initial attempt to be carried out using a deep learning model.

12

Using Word Embeddings for Italian Crime News Categorization

Bonisoli Giovanni, Rollo Federica, Po Laura

Annals of Computer Science and Information Systems

|

2021

|

Vol. 25

461--470

EN

Several studies have shown that the use of embeddings improves outcomes in many NLP activities, including text categorization. In this paper, we focus on how word embeddings can be used on newspaper articles about crimes to categorize them according to the type of crime they report. Our approach was tested on an Italian dataset of 15,361 crime news articles combining different Word2Vec models and exploiting supervised and unsupervised Machine Learning categorization algorithms. The tests show very promising results.

13

Evaluation of Neural Network Transformer Models for Named-Entity Recognition on Low-Resourced Languages

Hanslo Ridewaan

Annals of Computer Science and Information Systems

|

2021

|

Vol. 25

115--119

EN

In this paper, transformer models are used to evaluate ten low-resourced South African languages for NER. Further, these transformer models are compared to bi-LSTM-aux and CRF models. The transformer models have the highest F-score of 84%. This result is significant within the context of the study, as previous research could not achieve F-scores of 80%. However, the CRF and bi-LSTM-aux models remain top performers in sequence tagging. Transformer models are viable for low-resourced languages. Future research could improve upon these findings by implementing a linear-complexity recurrent transformer variant.

14

Knowledge graphs effectiveness in Neural Machine Translation improvement

Ahmadnia Benyamin, Dorr Bonnie J., Kordjamshidi Parisa

Computer Science

|

2020

|

T. 21 (3)

299–318

EN

Maintaining semantic relations between words during the translation process yields more accurate target-language output from Neural Machine Translation (NMT). Although difficult to achieve from training data alone, it is possible to leverage Knowledge Graphs (KGs) to retain source-language semantic relations in the corresponding target-language translation. The core idea is to use KG entity relations as embedding constraints to improve the mapping from source to target. This paper describes two embedding constraints, both of which employ Entity Linking (EL)—assigning a unique identity to entities—to associate words in training sentences with those in the KG: (1) a monolingual embedding constraint that supports an enhanced semantic representation of the source words through access to relations between entities in a KG; and (2) a bilingual embedding constraint that forces entity relations in the source-language to be carried over to the corresponding entities in the target-language translation. The method is evaluated for English-Spanish translation exploiting Freebase as a source of knowledge. Our experimental results demonstrate that exploiting KG information not only decreases the number of unknown words in the translation but also improves translation quality

15

Compressing sentiment analysis CNN models for efficient hardware processing

Wróbel Krzysztof, Karwatowski Michał, Wielgosz Maciej, Pietroń Marcin, Wiatr Kazimierz

Computer Science

|

2020

|

T. 21 (1)

25--41

EN

Convolutional neural networks (CNNs) were created for image classification tasks. Shortly after their creation, they were applied to other domains, including natural language processing (NLP). Nowadays, solutions based on artificial intelligence appear on mobile devices and embedded systems, which places constraints on memory and power consumption, among others. Due to CNN memory and computing requirements, it is necessary to compress them in order to be mapped to the hardware. This paper presents the results of the compression of efficient CNNs for sentiment analysis. The main steps involve pruning and quantization. The process of mapping the compressed network to an FPGA and the results of this implementation are described. The conducted simulations showed that the 5-bit width is enough to ensure no drop in accuracy when compared to the floating-point version of the network. Additionally, the memory footprint was significantly reduced (between 85 and 93% as compared to the original model).

16

Voice authentication based on the Russian-language dataset, MFCC method and the anomaly detection algorithm

Sidorova Anna, Kogos Konstantin

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

537--540

EN

Almost all people's data is stored on their personal devices. For this reason, there is a need to protect information from unauthorized access by means of user authentication. PIN codes, passwords, tokens can be forgotten, lost, transferred, brute-force attacked. For this reason, biometric authentication is gaining in popularity. Biometric data are unchanged for a long time, different for users, and can be measured. This paper explores voice authentication due to the ease of use of this technology, since obtaining voice characteristics of users doesn't require an equipment in addition to the microphone, which is built into almost all devices. The method of voice authentication based on an anomaly detection algorithm has been proposed. The software module for text-independent authentication has been developed on the Python language. It's based on a new Mozilla's open source voice dataset "Common voice". Experimental results confirmed the high accuracy of authentication by the proposed method.

17

Overview of the Transformer-based Models for NLP Tasks

Gillioz Anthony, Casas Jacky, Mugellini Elena, Abou Khaled Omar

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

179--183

EN

In 2017, Vaswani et al. proposed a new neural network architecture named Transformer. That modern architecture quickly revolutionized the natural language processing world. Models like GPT and BERT relying on this Transformer architecture have fully outperformed the previous state-of-the-art networks. It surpassed the earlier approaches by such a wide margin that all the recent cutting edge models seem to rely on these Transformer-based architectures. In this paper, we provide an overview and explanations of the latest models. We cover the auto-regressive models such as GPT, GPT-2 and XLNET, as well as the auto-encoder architecture such as BERT and a lot of post-BERT models like RoBERTa, ALBERT, ERNIE 1.0/2.0.

18

Open IE-Triples Inference - Corpora Development and DNN Architectures

Víta Martin, Škoda Petr

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

199--204

EN

Natural language inference (NLI) is a well established part of natural language understanding (NLU). This task is usually stated as a 3-way classification of sentence pairs with respect to entailment relation (entailment, neutral, contradiction). In this work, we focus on a derived task of relation inference: we propose a method of transforming a general NLI corpus to an annotated corpus for relation inference that utilizes existing NLI annotations. We subsequently introduce a novel relation inference corpus obtained from a well known SNLI corpus and provide its brief characterization. We investigate several DNN siamese architectures for this task and this particular corresponding corpus. We set several baselines including hypothesis only baseline. Our best architecture achieved 96.92% accuracy.

19

Named Entity Recognition and Named Entity Linking on Esports Contents

Liu Ziyu, Leng Yifan, Wang Meiqi, Lin Congzhu

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

189--192

EN

We built a named entity recognition/linking system on Esports News. We established an ontology for Esports-related entities, collected and annotated corpus from 80 articles on 4 different Esports titles, trained CRF and BERT-based entity recognizer, built a basic DOTA2 knowledge base and a Entity linker that links mentions to articles in Liquipedia, and an end-to-end web app which serves as a demo of this entire proof-of-conecpt system. Our system achieved an over 61% overall entity-level F1-score on the test set for the NER task.

20

Explorations into Deep Learning Text Architectures for Dense Image Captioning

Toshevska Martina, Stojanovska Frosina, Zdravevski Eftim, Lameski Petre, Gievska Sonja

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

129--136

EN

Image captioning is the process of generating a textual description that best fits the image scene. It is one of the most important tasks in computer vision and natural language processing and has the potential to improve many applications in robotics, assistive technologies, storytelling, medical imaging and more. This paper aims to analyse different encoder-decoder architectures for dense image caption generation while focusing on the text generation component. Already trained models for image feature generation are utilized with transfer learning. These features are used for describing the regions using three different models for text generation. We propose three deep learning architectures for generating one-sentence captions of Regions of Interest (RoIs). The proposed architectures reflect several ways of integrating features from images and text. The proposed models were evaluated and compared with several metrics for natural language generation.