Wyniki wyszukiwania - BazTech

1

Formation of highly specialized chatbotsfor advanced search

Yarovyi Andrii, Kudriavtsev Dmytro

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2024

|

T. 14, nr 1

67--70

EN

In this research, the formation of highly specialized chatbots was presented. The influence of multi-threading subject areas search was noted. The use of related subject areas in chatbot text analysing was defined. The advantages of using multiple related subject areas are noted using the example of an intelligent chatbot.

PL

W tym badaniu przedstawiono tworzenie wysoce wyspecjalizowanych chatbotów. Zwrócono uwagę na wpływ wielowątkowego wyszukiwania obszarów tematycznych. Zdefiniowano wykorzystanie powiązanych obszarów tematycznych w analizie tekstu chatbota. Na przykładzie inteligentnego chatbota odnotowano zalety korzystania z wielu powiązanych obszarów tematycznych.

2

Rozpoznawanie emocji w tekstach polskojęzycznych z wykorzystaniem metody słów kluczowych

Nowaczyk A., Jackowska-Strumiłło L.

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2017

|

T. 7, nr 2

102--105

PL

Dynamiczny rozwój sieci społecznościowych sprawił, że Internet stał się najpopularniejszym medium komunikacyjnym. Zdecydowana większość komunikatów wymieniana jest w postaci widomości tekstowych, które niejednokrotnie odzwierciedlają stan emocjonalny autora. Identyfikacja emocji w tekstach znajduje szerokie zastosowanie w handlu elektronicznym, czy telemedycynie, stając się jednocześnie ważnym elementem w komunikacji. człowiek-komputer. W niniejszym artykule zaprezentowano metodę rozpoznawania emocji w tekstach polskojęzycznych opartą o algorytm detekcji słów kluczowych i lematyzację. Uzyskano dokładność rzędu 60%. Opracowano również pierwszą polskojęzyczną bazę słów kluczowych wyrażających emocje.

EN

Dynamic development of social networks caused that the Internet has become the most popular communication medium. A vast majority of the messages are exchanged in text format and very often reflect authors’ emotional states. Detection of the emotions in text is widely used in e-commerce or telemedicine becoming the milestone in the field of human-computer interaction. The paper presents a method of emotion recognition in Polish-language texts based on the keywords detection algorithm with lemmatization. The obtained accuracy is about 60%. The first Polish-language database of keywords expressing emotions has been also developed.

3

Applying a q-Gram based multiple string matching algorithm for approximate matching

Susik R.

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2017

|

T. 7, nr 3

47--50

EN

We consider the application of multiple pattern matching (Multi AOSO on q-Grams) algorithm for approximate pattern matching. We propose the on-line approach which translates the problem from approximate pattern matching into a multiple pattern one (called partitioning into exact search). Presented solution allows relatively fast search multiple patterns in text with given k-differences(or mismatches). This paper presents comparison of solution based on MAG algorithm, and [4]. Experiments on DNA, English, Proteins and XML texts with up to k errors show that the new proposed algorithm achieves relatively good results in practical use.

PL

Rozważamy zastosowanie algorytmu wyszukiwania wielu wzorców (Multi AOSO on q-Grams) do wyszukiwania przybliżonego. Proponujemy rozwiązanie on-line, upraszczające problem wyszukiwania przybliżonego do wyszukiwania wielu wzorców. Zaprezentowane rozwiązanie umożliwia relatywnie szybko wyszukiwać wiele wzorców dla odległości Levenshteina (lub Hamminga) z ograniczeniem do k. W artykule porównane jest rozwiązanie oparte na algorytmie MAG oraz [4]. Badania eksperymentalne przeprowadzone na zbiorach DNA, English, Proteins and XML z różnymi wartościami k wykazały, że zaproponowany algorytm osiąga relatywnie dobre wyniki w praktycznym zastosowaniu.

4

Metoda projektowania bazy wiedzy oraz reguł segmentatora regułowego oparta o formalną analizę pojęć

Mirończuk M.

Bezpieczeństwo i Technika Pożarnicza

|

2014

|

Nr 2

93--103

PL

Cel: Zaprezentowanie rozwiązania problemu segmentacji tekstu dziedzinowego. Badany tekst pochodził z raportów (formularza „Informacji ze zdarzenia”, pola „Dane opisowe do informacji ze zdarzenia”) sporządzanych po akcjach ratowniczo-gaśniczych przez jednostki Państwowej Straży Pożarnej. Metody: W celu realizacji zadania autor zaproponował metodę projektowania bazy wiedzy oraz reguł segmentatora regułowego. Zaproponowana w artykule metoda opiera się na formalnej analizie pojęć. Zaprojektowana według proponowanej metody baza wiedzy oraz reguł umożliwiła przeprowadzenie procesu segmentacji dostępnej dokumentacji. Poprawność i skuteczność proponowanej metody zweryfikowano poprzez porównanie jej wyników z dwoma innymi rozwiązaniami wykorzystywanymi do segmentacji tekstu. Wyniki: W ramach badań i analiz opisano oraz pogrupowano reguły i skróty występujące w badanych raportach. Dzięki zastosowaniu formalnej analizy pojęć utworzono hierarchię wykrytych reguł oraz skrótów. Wydobyta hierarchia stanowiła zarazem bazę wiedzy oraz reguł segmentatora regułowego. Przeprowadzone eksperymenty numeryczne i porównawcze autorskiego rozwiązania z dwoma innymi rozwiązaniami wykazały znacznie lepsze działanie tego pierwszego. Przykładowo otrzymane wyniki F-miary otrzymane w wyniku zastosowania proponowanej metody wynoszą 95,5% i są lepsze o 7-8% od pozostałych dwóch rozwiązań. Wnioski: Zaproponowana metoda projektowania bazy wiedzy oraz reguł segmentatora regułowego umożliwia projektowanie i implementację oprogramowania do segmentacji tekstu z małym błędem podziału tekstu na segmenty. Podstawowa reguła dotycząca wykrywania końca zdania poprzez interpretację kropki i dodatkowych znaków jako końca segmentu w rzeczywistości, zwłaszcza dla tekstów specjalistycznych, musi być opakowana dodatkowymi regułami. Działania te znacznie podnoszą jakość segmentacji i zmniejszają jej błąd. Do budowy i reprezentacji takich reguł nadaje się przedstawiona w artykule formalna analiza pojęć. Wiedza inżyniera oraz dodatkowe eksperymenty mogą wzbogacać utworzoną sieć o nowe reguły. Nowo wprowadzana wiedza może zostać w łatwy sposób naniesiona na aktualnie utworzoną sieć semantyczną, tym samym przyczyniając się do polepszenia segmentacji tekstu. Ponadto w ramach eksperymentu numerycznego wytworzono unikalny: zbiór reguł oraz skrótów stosowanych w raportach, jak również zbiór prawidłowo wydzielonych i oznakowanych segmentów.

EN

Objective: Presentation of a specialist text segmentation technique. The text was derived from reports (a form “Information about the event”, field “Information about the event - descriptive data”) prepared by rescue units of the State Fire Service after firefighting and rescue operations. Methods: In order to perform the task the author has proposed a method of designing the knowledge base and rules for a text segmentation tool. The proposed method is based on formal concept analysis (FCA). The knowledge base and rules designed by the proposed method allow performing the segmentation process of the available documentation. The correctness and effectiveness of the proposed method was verified by comparing its results with the other two solutions used for text segmentation. Results: During the research and analysis rules and abbreviations that were present in the studied specialist texts were grouped and described. Thanks to the formal concepts analysis a hierarchy of detected rules and abbreviations was created. The extracted hierarchy constituted both a knowledge and rules base of tools for segmentation of the text. Numerical and comparative experiments on the author's solution with two other methods showed significantly better performance of the former. For example, the F-measure results obtained from the proposed method are 95.5% and are 7-8% better than the other two solutions. Conclusions: The proposed method of design knowledge and rules base text segmentation tool enables the design and implementation of software with a small error divide the text into segments. The basic rule to detect the end of a sentence by the interpretation of the dots and additional characters as the end of the segment, in fact, especially in case of specialist texts, must be packaged with additional rules. These actions will significantly improve the quality of segmentation and reduce the error. For the construction and representation of such rules is suitable presented in the article, the formal concepts analysis. Knowledge engineering and additional experiments can enrich the created hierarchy by the new rules. The newly inserted knowledge can be easily applied to the currently established hierarchy thereby contributing to improving the segmentation of the text. Moreover, within the numerical experiment is made unique: a set of rules and abbreviations used in reports and set properly separated and labeled segments.

5

Web pages content analysis using browser-based volunteer computing

Turek W., Nawarecki E., Dobrowolski G., Krupa T., Majewski P.

Computer Science

|

2013

|

Vol. 14 (2)

215--230

EN

Existing solutions to the problem of finding valuable information on the Web suffers from several limitations like simplified query languages, out-of-date in- formation or arbitrary results sorting. In this paper a different approach to this problem is described. It is based on the idea of distributed processing of Web pages content. To provide sufficient performance, the idea of browser-based volunteer computing is utilized, which requires the implementation of text processing algorithms in JavaScript. In this paper the architecture of Web pages content analysis system is presented, details concerning the implementation of the system and the text processing algorithms are described and test results are provided.

6

Proces i metody eksploracji danych tekstowych do przetwarzania raportów z akcji ratowniczo-gaśniczych

Mirończuk M., Maciak T.

Metody Informatyki Stosowanej

|

2011

|

nr 4

147-174

EN

This paper describes the process for processing reports from rescue and firefighting. To reports processing methods and techniques used in the field of textual data mining (text mining). This paper also presents the classification and analysis methods section of text which is considered a potential use in the proposed process.

7

Recognizing non-translatable symbols in a multi-lingual computer--assisted translation system for DTP documents

Grabowski S., Draus C., Bieniecki W.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2010

|

T. 14, z. 3/1

555-561

EN

The paper is devoted to the problem of computer-assisted translation of catalogues and advertising brochures (DTP documents), where the text to translate consists of many short separated snippets. One of the issues that may facilitate the translation process is to recognize the phrases which should be copied verbatim, no matter what the target language is. These include technical data with units of measurement, abbreviations, numbers etc. but also trademark symbols. As for the first problem, the presented algorithm uses statistical analysis of the characters inside a character sequence within each segment of the considered phrase, where segments boundaries are marked by special characters, like hyphens or slashes. If at least one of the segmented is labeled "non-symbol", the whole phrase should be handled by the human translator, otherwise it is considered non-translatable and copied verbatim, hence saving the translator's work. For the trademark start boundary recognition problem, we proposed a simple but seemingly robust solution based on similarity of word suffixes preceding Ž and similar characters in a given phrase, together with heuristic rules based on character case of those words.

PL

Artykuł dotyczy automatycznego tłumaczenia katalogów i broszur reklamowych przy użyciu systemu klasy CAT. Jedną z funkcjonalności wspomagającą proces tłumaczenia jest rozpoznawanie fraz, które nie powinny być tłumaczone, takich jak dane techniczne, symbole, skróty, liczby (problem pierwszy), a także znaki handlowe, symbole praw autorskich i symbole zastrzeżone (problem drugi). Zaproponowany algorytm dla pierwszego problemu przeprowadza analizę statystyczną znaków w badanym ciągu, rozdzielając uprzednio słowa na takich znakach, jak łącznik czy ukośnik. Jeśli choć jeden z segmentów jest uznany za "nie-symbol", to cała fraza powinna podlegać tłumaczeniu; w przeciwnym razie jest ona kopiowana bez zmian. Algorytm rozwiązujący problem drugi wykrywa początki fraz zastrzeżonych, opierając się podobieństwie sufiksow wyrazowych poprzedzających w danej frazie symbol Ž (lub inny tego typu). Dodatkowym kryterium heurystycznym jest uwzględnienie wielkości liter w badanych sufiksach.