This paper focuses of the implementation of the goal – oriented chatbot in order to prepare virtual resumes of candidates for job position. In particular the study was devoted to testing the feasibility of using Deep Q Networks (DQN) to prepare an effective chatbot conversation flow with the final system user. The results of the research confirmed that the use of the DQN model in the training of the conversational system allowed to increase the level of success, measured as the acceptance of the resume by the recruiter and the finalization of the conversation with the bot. The success rate increased from 10% to 64% in experimental environment and from 15% to 45% in production environment. Moreover, DQN model allowed the conversation to be shortened by an average of 4 questions from 11 to 7.
Artykuł dotyczy szczególnego rodzaju szyfrowania wiadomości, któremu towarzyszy ukrywanie szyfrogramu pod postacią tekstu. W efekcie otrzymujemy szyfrogram w formie tekstu, który jest poprawny stylistycznie i semantycznie, a więc zbliżony do tekstu naturalnego. W toku badań analizujemy metodę szyfrująco-ukrywającą s-Tech, a w szczególności jej wskaźnik φ, który służy do oceny trudności generowania szyfrogramu i do szacowania jakości wynikowego tekstu, to jest stopnia naturalizmu powstającego szyfrogramu. Celem badań jest sprawdzenie użyteczności tej miary jako uniwersalnego wskaźnika złożoności przebiegu szyfrowania i jakości tekstu. Badanie wskaźnika φ odbywa się poprzez manipulację dwoma parametrami systemu: długością n-Gramów w bazie n-Gramowej (w zakresie od n=1 do n=6, oznaczanej też jako LBS) oraz włączając (albo wyłączając) przetwarzanie wstępne. Oceniamy ich łączny wpływ – nie tylko na przebieg szyfrowania, na trudność, lecz również na jakość szyfrogramu. Analiza odbywa się poprzez porównanie wyników dla trzech wariantów preprocessingu: szyfrowanie hybrydowe połączone z kompresją LZW, kompresja SMAZ oraz dla sytuacji referencyjnej, w której tekst jawny w zapisie ASCII jest szyfrowany bez przetwarzania wstępnego.
EN
The paper focuses on a unique encryption method combined with shaping ciphertext as natural text, which is a form of steganography. We analyze the s-Tech encryption method and its φ indicator by evaluating the difficulty of ciphertext generation and the quality of the resulting natural text. The research aims to examine φ as a universal indicator of both encryption complexity and natext quality. The analysis involves three preprocessing variants: hybrid encryption with LZW compression, SMAZ compression, and a reference situation with null preprocessing.
Sentiment analysis is an efficient technique for expressing users’ opinions (neutral, negative or positive) regarding specific services or products. One of the important benefits of analyzing sentiment is in appraising the comments that users provide or service providers or services. In this work, a solution known as adaptive rider feedback artificial tree optimization-based deep neuro-fuzzy network (RFATO-based DNFN) is implemented for efficient sentiment grade classification. Here, the input is pre-processed by employing the process of stemming and stop word removal. Then, important factors, e.g. SentiWordNet-based features, such as the mean value, variance, as well as kurtosis, spam word-based features, term frequency-inverse document frequency (TF-IDF) features and emoticon-based features, are extracted. In addition, angular similarity and the decision tree model are employed for grouping the reviewed data into specific sets. Next, the deep neuro-fuzzy network (DNFN) classifier is used to classify the sentiment grade. The proposed adaptive rider feedback artificial tree optimization (A-RFATO) approach is utilized for the training of DNFN. The A-RFATO technique is a combination of the feedback artificial tree (FAT) approach and the rider optimization algorithm (ROA) with an adaptive concept. The effectiveness of the proposed A-RFATO-based DNFN model is evaluated based on such metrics as sensitivity, accuracy, specificity, and precision. The sentiment grade classification method developed achieves better sensitivity, accuracy, specificity, and precision rates when compared with existing approaches based on Large Movie Review Dataset, Datafiniti Product Database, and Amazon reviews.
With ever-increasing demand, social media platforms are rapidly developing to enable users to express and share their opinions on a variety of topics. Twitter is one such social media site. This platform enables a comprehensive view of the social media target setting, which may include products, social events, political scenarios, and administrative resolutions. The accessible tweets expressing the target audience’s perspective are frequently impacted by ambiguity caused by natural language processing (NLP) limitations. By classifying tweets according to their sentiment polarity, we can determine whether they express a good or negative point of view, a neutral opinion, or an input tweet that is irrelevant to the sentiment polarity context. Categorizing tweets according to their sentiment can assist future activities within the target domain in constructively evaluating the sentiment polarity and enabling improved decision-making based on the observed sentiment polarity. In this study, tweets that were previously categorized with one of the sentiment polarities were used to conduct predictive analytics of the new tweet to determine its sentiment polarity. The ambiguity of the tweets corpus utilized in the training phase is a critical limitation of the sentiment categorization procedure. While several recent models proposed sentiment classification algorithms, they confined themselves to two labels: positive and negative opinion, oblivious to the plague of ambiguity in the training corpus. In this regard, a novel multi-label classification of sentiment polarity called handling dimensionality of ambiguity using ensemble classification (HAD-EC) method, which diffuses ambiguity and thus minimizes false alerts, is proposed. The experimental assessment validates the HAD-EC approach by comparing the suggested model’s performance to other two existing models.
Writing a well-structured scientific documents, such as articles and theses, is vital for comprehending the document's argumentation and understanding its messages. Furthermore, it has an impact on the efficiency and time required for studying the document. Proper document segmentation also yields better results when employing automated Natural Language Processing (NLP) manipulation algorithms, including summarization and other information retrieval and analysis functions. Unfortunately, inexperienced writers, such as young researchers and graduate students, often struggle to produce well-structured professional documents. Their writing frequently exhibits improper segmentations or lacks semantically coherent segments, a phenomenon referred to as "mal-segmentation." Examples of mal-segmentation include improper paragraph or section divisions and unsmooth transitions between sentences and paragraphs. This research addresses the issue of mal-segmentation in scientific writing by introducing an automated method for detecting mal-segmentations, and utilizing Sentence Bidirectional Encoder Representations from Transformers (sBERT) as an encoding mechanism. The experimental results section shows a promising results for the detection of mal-segmentation using the sBERT technique.
Rising awareness of sustainable development challenges, along with the quest for optimization of the everyday functioning of the city, motivate many urban authorities to search for promising concepts and solutions. One of these is the smart city concept, which has gained governors of cities’ attention for little more than ten years. An object of research and development, it is still a distinctive feature for the cities that adopt this concept. City marketers use such distinction towards a large palette of beneficiaries of the city. At the same time, it deploys some traits suggesting synergies between the implementation of smart city solutions and sustainable development goals. The main objective of our work was to verify if the relationship between these aspects (smartness and sustainability of a transportation) in smart city rankings exists and, if that is true, what impact it has on marketing communication of the city comprised in such rankings. To fulfill this goal, we answered such research questions as: what place sustainability criteria in smart city rankings have occurred, how is the transport represented in these criteria, what use graded cities make of their presence in such competition, and which perspective dominates (if any) in daily marketing communication activities of the city. To provide such an analysis, we considered the criteria used to rank the cities to find the places that accorded to sustainable ones. We examined the marketing use of the results of such rankings, referring to the official websites and social media of selected cities (random selection from the total population of 174 cities comprised). The sources used to provide the data in natural language, and their analysis proceeded with methods and tools used in NLP (natural language processing), which are accessible through CLARIN.EU infrastructure. The results determine that cities can be classed into different groups, accordingly to their sustainability/smartness pending, and ability to use accorded ranks in marketing context.
In this paper, we experimentally demonstrate two types of dissipative soliton resonant (DSR) and noise-like pulse (NLP) in a mode-locked fiber laser using the nonlinear optical loop mirror (NOLM). By appropriately adjusting the polarization states, the switchable generation of DSR and NLP can be achieved from one mode-locked fiber laser. By adjusting the pump power, the pulse width of DSR increases gradually from 2.45 to 13.35 ns with a constant peak intensity, while the NLP just has a little increase, even splitting into two narrower pulses at higher pump power. Two types of DSR and NLP have the same pulse periods of 1.29 μs, corresponding to the cavity length of the fiber laser. The obtained results display the evolution process of DSR pulse and NLP in mode-locked fiber laser and have some application in optical sensing, spectral reflectometry, micromachining, and other relative domains.
This paper describes an image caption generation system using deep neural networks. The model is trained to maximize the probability of generated sentence, given the image. The model utilizes transfer learning in the form of pretrained convolutional neural networks to preprocess the image data. The datasets are composed of a still photographs and associated with it, five captions in English language. Constructed model is compared to other similarly constructed models using BLEU score system and ways to further improve its performance are proposed.
PL
W tym artykule opisano system generujący podpisy do zdjęć z wykorzystaniem głębokich sieci neuronowych. Model jest trenowany pod kątem maksymalizacji prawdopodobieństwa wygenerowanego zdania, dla zadanego obrazu. Model wykorzystuje uczenie transferowe w postaci wytrenowanych wstępnie neuronowych sieci konwolucyjnych. Zbiory danych wykorzystane do trenowania modelu składają się z fotografii, oraz przypisanych do niej pięciu zdań w języku angielskim. Skonstruowany model jest potem porównany z innymi modelami o podobnej konstrukcji z wykorzystaniem punktacji BLEU.
With the software playing a key role in most of the modern, complex systems it is extremely important to create and keep the software requirements precise and non-ambiguous. One of the key elements to achieve such a goal is to define the terms used in a requirement in a precise way. The aim of this study is to verify if the commercially available tools for natural language processing (NLP) can be used to create an automated process to identify whether the term used in a requirement is linked with a proper definition. We found out, that with a relatively small effort it is possible to create a model that detects the domain specific terms in the software requirements with a precision of 87 %. Using such model it is possible to determine if the term is followed by a link to a definition.
This study aims to evaluate experimentally the word vectors produced by three widely used embedding methods for the word-level semantic text similarity in Turkish. Three benchmark datasets SimTurk, AnlamVer, and RG65_Turkce are used in this study to evaluate the word embedding vectors produced by three different methods namely Word2Vec, Glove, and FastText. As a result of the comparative analysis, Turkish word vectors produced with Glove and FastText gained better correlation in the word level semantic similarity. It is also found that The Turkish word coverage of FastText is ahead of the other two methods because the limited number of Out of Vocabulary (OOV) words have been observed in the experiments conducted for FastText. Another observation is that FastText and Glove vectors showed great success in terms of Spearman correlation value in the SimTurk and AnlamVer datasets both of which are purely prepared and evaluated by local Turkish individuals. This is another indicator showing that these aforementioned datasets are better representing the Turkish language in terms of morphology and inflections.
Article provides review on current most popular text processing technics; sketches their evolution and compares sequence and dependency models in detecting semantic relationship between words.
PL
Artykuł zawiera przegląd najpopularniejszych metod reprezentacji tekstu - modele sekwencyjne i grafowe w kontekście wykrywania relacji semantycznych między słowami.
Effective communication is one of the basic pillars of activities for which cooperation of people is inevitable. At the same time, it is also a significant issue discussed in the context of managerial work, entrepreneurship, business, services, and many other areas of economic environment. Neuro-Linguistic Programming (NLP) is a significant attribute of improving the quality and effectiveness of communication. The NLP concept explains seemingly autonomous behaviour as constructed behaviour created by the series of consecutive stages which we often perceive as one action. By accepting the fact, that what we experience comes from programmed sequences of thinking and behaviour, NLP provides us with the knowledge and tools for discovering the structure of these programmes. Based on the identification of the attributes of neurolinguistics programming structure with the focus on communication and techniques of their implementation in managerial work, the aim of the paper is to verify the methods, which enable to specify the factors of assessing NLP in managerial work. The research was carried out on the sample of 124 managers, out of which 58 were women and 66 men. Based on the research results, two original NLP methods were verified: NLPC - Neuro linguistic programming communication and NLPT - Neuro linguistic programming techniques. By the means of a factor analysis, the factors of Representational systems and Rapport were extracted within the first methodology. Within the second methodology, the factors of Leading and Pacing were extracted. In the paper, the researchers present the basic psychometric parameters of both methodologies - eigenvalues, the percentage of explained variance, Cronbach’s alphas, intercorrelations of factors. Both methodologies contribute to the operationalization of the NLP issue. These methodologies can be used in managerial practice mainly in the area of education and training of managers.
PL
Skuteczna komunikacja jest jednym z podstawowych filarów działań, dla których współpraca ludzi jest nieunikniona. Jednocześnie jest to również istotna kwestia poruszana w kontekście pracy menedżerskiej, przedsiębiorczości, biznesu, usług i wielu innych obszarów środowiska gospodarczego. Programowanie neurolingwistyczne (NLP) jest istotnym atrybutem poprawy jakości i skuteczności komunikacji. Koncepcja NLP tłumaczy pozornie autonomiczne zachowanie jako skonstruowane zachowanie stworzone przez serię kolejnych etapów, które często postrzegamy jako jedno działanie. Akceptując fakt, że to, czego doświadczamy, pochodzi z zaprogramowanych sekwencji myślenia i zachowania, NLP dostarcza nam wiedzy i narzędzi do odkrywania struktury tych programów. W oparciu o identyfikację cech struktury programowania neurolingwistyki z naciskiem na komunikację i techniki ich wdrażania w pracy menedżerskiej, celem pracy jest weryfikacja metod, które pozwalają określić czynniki oceny NLP w pracy menedżerskiej. Badania przeprowadzono na próbie 124 menedżerów, z których 58 stanowiły kobiety a 66 mężczyźni. Na podstawie wyników badań zweryfikowano dwie oryginalne metody NLP: NLPC - Neurolingwistyczna komunikacja programistyczna i NLPT - Neurolingwistyczne techniki programowania. Za pomocą analizy czynnikowej, zostały wyodrębnione czynniki systemów reprezentacyjnych i dobrych stosunków w ramach pierwszej metodologii. W ramach drugiej metodologii wyodrębniono czynniki Wiodący i Stymulujący. W pracy przedstawiono podstawowe parametry psychometryczne obu metodologii - wartości własne, procent wyjaśnionej wariancji, alfa Cronbacha, interkorelacje czynników. Obie metodologie przyczyniają się do operacjonalizacji problemu NLP. Metodologie te mogą być stosowane w praktyce zarządczej, głównie w obszarze edukacji i szkolenia menedżerów.
Automatic text categorization presents many difficulties. Modern algorithms are getting better in extracting meaningful information from human language. However, they often significantly increase complexity of computations. This increased demand for computational capabilities can be facilitated by the usage of hardware accelerators like general purpose graphic cards. In this paper we present a full processing flow for document categorization system. Gram-Schmidt process signatures calculation up to 12 fold decrease in computing time of system components.
14
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
To develop programming language skills, there are many tools available which have been developed to introduce the basics of programming to new-comers in the area of programming. Whatever tools are there for the programming environment of flow chart based notation depends on the interference of user with the system. The flowchart-based Environment depends on the intermediate code generated but every time human intervention is needed. The development of the environment for teaching aids can be other area where the flowcharts can be used. The main animus of the contemplate research work is to enroot a framework which not only automatically converts the process text but also to deploy it as software to create training materials. That is to automate the flowchart drawing activity based on the text inputs given by the end users hence; this research proposes a strategy that will be used to draw flowcharts without human intervention. It can also be used to represent the basics of programming problems to new users. The feature applied in the system not only automatically converts the text into flowchart but also builds up the critical thinking abilities of new software engineers. It also improves solution designing skill of new software engineers. Otherwise also the system is useful to represent the any given process text into the graphical form using the standard flowcharting symbols.
15
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Continuous vector representations, as a distributed representations for words have gained a lot of attention in Natural Language Processing (NLP) field. Although they are considered as valuable methods to model both semantic and syntactic features, they still may be improved. For instance, the open issue seems to be to develop different strategies to introduce the knowledge about the morphology of words. It is a core point in case of either dense languages where many rare words appear and texts which have numerous metaphors or similies. In this paper, we extend a recent approach to represent word information. The underlying idea of our technique is to present a word in form of a bag of syllable and letter n-grams. More specifically, we provide a vector representation for each extracted syllable-based and letter-based n-gram, and perform concatenation. Moreover, in contrast to the previous method, we accept n-grams of varied length n. Further various experiments, like tasks-word similarity ranking or sentiment analysis report our method is competitive with respect to other state-of-theart techniques and takes a step toward more informative word representation construction.
16
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The article describes the elements of news annotating information systems to further assess the news validity and reliability of their sources. Considers the process of system operation and algorithms that can be used in the implementation of such system.
PL
W artykule opisano nowy system opisu informacji ułatwiający ocenę wiarygodności źródła. Opisano algorytm systemu i zasadę działania.
Text alignment and text quality are critical to the accuracy of Machine Translation (MT) systems, some NLP tools, and any other text processing tasks requiring bilingual data. This research proposes a language-independent bisentence filtering approach based on Polish (not a position-sensitive language) to English experiments. This cleaning approach was developed on the TED Talks corpus and also initially tested on the Wikipedia comparable corpus, but it can be used for any text domain or language pair. The proposed approach implements various heuristics for sentence comparison. Some of the heuristics leverage synonyms as well as semantic and structural analysis of text as additional information. Minimization of data loss has been? ensured. An improvement in MT system scores with text processed using this tool is discussed.
This study aims to explore the possibility of improving human-robot interaction (HRI) by exploiting natural language resources and using natural language processing (NLP) methods. The theoretical basis of the study rests on the claim that effective and efficient human robot interaction requires linguistic and ontological agreement. A further claim is that the required ontology is implicitly present in the lexical and grammatical structure of natural language. The paper offers some NLP techniques to uncover (fragments of) the ontology hidden in natural language and to generate semantic representations of natural language sentences using that ontology. The paper also presents the implementation details of an NLP module capable of parsing English and Turkish along with an overview of the architecture of a robotic interface that makes use of this module for expressing the spatial motions of objects observed by a robot.
Szeroki dostęp do Internetu, istnienie ogromnej ilości tekstów w wersji elektronicznej powoduje konieczność rozwoju nauki określanej jako inżynieria lingwistyczna. Zajmuje się ona szeroko pojętym przetwarzaniem danych lingwistycznych. Jednym z aspektów przetwarzania tego rodzaju danych jest generowanie tekstów w języku naturalnym. Ponieważ przeważająca ilość powstających tekstów dostępna jest w wersji elektronicznej, istnieje bardzo duże zapotrzebowanie na programy przetwarzające je. Głównym celem powstania tego artykułu jest przedstawienie koncepcji relacyjnej bazy danych będącej podstawą eksperymentalnego programu automatycznie generującego oceny opisowe w nauczaniu wczesnoszkolnym.
EN
Common access to the Internet and huge number of the texts in numeric version causes necessity of progress of the science known as linguistic engineering. It researches the wide implied natural language processing. One of the aspects of processing that kind of data is genering the texts in the natural language. Because the most of the nascent texts are available in numeric version, there is large demand for the programs processing them. The main point of that article is to present the conception of a database that is the fundamental part of the experimental program automatically genering descriptive grades in elementary schools.
The phonetical statistics were collected from several Polish corpora. The paper is a summary of the data which are phoneme n-grams and some phenomena in the statistics. Triphone statistics apply context-dependent speech units which have an important role in speech recognition systems and were never calculated for a large set of Polish written texts. The standard phonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described.
PL
W niniejszej pracy zaprezentowano opis statystyk głosek języka polskiego zebranych z dużej liczby tekstów. Triady głosek pełnią istotną rolę w rozpoznawaniu mowy. Omówiono obserwacje dotyczące zebranych statystyk i przedstawiono listy najpopularniejszych elementów.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.