Ograniczanie wyników
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 19

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  information extraction
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
EN
In this paper, we discuss a software architecture, which has been developed for the needs of the System for Intelligent Maritime Monitoring (SIMMO). The system bases on the state-of-the-art information fusion and intelligence analysis techniques, which generates an enhanced Recognized Maritime Picture and thus supports situation analysis and decision-making. The SIMMO system aims to automatically fuse an up-to-date maritime data from Automatic Identification System (AIS) and open Internet sources. Based on collected data, data analysis is performed to detect suspicious vessels. Functionality of the system is realized in a number of different modules (web crawlers, data fusion, anomaly detection, visualization modules) that share the AIS and external data stored in the system’s database. The aim of this article is to demonstrate how external information can be leveraged in maritime awareness system and what software solutions are necessary. A working system is presented as a proof of concept.
PL
Prezentowany artykuł omawia architekturę oprogramowania opracowanego na potrzeby projektu System for Intelligent Maritime Monitoring (SIMMO). System ten bazuje na najnowszych osiągnięciach w dziedzinach fuzji oraz inteligentnej analizy danych w celu generowania wzbogaconego obrazu sytuacji na morzu i wspomagania decyzji. SIMMO w sposób automatyczny łączy dane dotyczące ruchu morskiego z systemu AIS z danymi pochodzącymi z otwartych źródeł internetowych. Dzięki zebranym danym możliwa jest analiza w celu wykrycia podejrzanych zachowań na morzu. Funkcjonalność systemu stanowi wypadkową zawartych w nim modułów (ekstrakcja danych, fuzja danych, detekcja anomalii, wizualizacja) współdzielących dostęp do baz z danymi AIS oraz z zewnętrznych źródeł. Celem artykułu jest demonstracja sposobu wykorzystywania zewnętrznych informacji w systemach przeznaczonych do monitorowania ruchu morskiego, a także prezentacja działającego systemu.
2
EN
The process of Information Extraction (IE) allows us to retrieve different types of information from natural language text by processing them automatically. Ontology-based information extraction (OBIE) is a subfield of information extraction. An increasing number of existing OBIE system may cause a problem with selection the most suitable solution. The general aim of this paper is to provide an approach for OBIE system selection and evaluation. It should ensure knowledge systematization and help users to find a proper solution that meets their needs.
PL
Zastosowanie ekstrakcji informacji pozwala na pozyskiwanie różnych typów informacji w języku naturalnym, jednocześnie umożliwiając automatyczne jego przetwarzanie. Systemy ekstrakcji informacji oparte na ontologiach są poddziedziną ekstrakcji informacji. Rosnąca ich liczba oraz różnorodność podkreślają ważność problemu, jednocześnie wskazując na możliwość występowania problemu związanego z ich doborem. Celem artykułu jest prezentacja podejścia wspierającego proces doboru i oceny systemów ekstrakcji informacji opartego na ontologiach.
EN
Effective analysis of structured documents may decide on management information systems performance. In the paper, an adaptive method of information extraction from structured text documents is considered. We assume that documents belong to thematic groups and that required set of information may be determined ”apriori”. The knowledge of document structure allows to indicate blocks, where certain information is more probable to appear. As the result structured data, which can be further analysed are obtained. The proposed solution uses dictionaries and flexion analysis, and may be applied to Polish texts. The presented approach can be used for information extraction from official letters, information sheets and product specifications.
EN
In this article attention is paid to improving the quality of text document classification. The common techniques of analysis of text documents used in classification are shown and the weakness of these methods arc stressed. Discussed here is the integration of quantitative and qualitative methods, which is increasing the quality of classification. In the proposed approach the expanded terms, obtained by using information patterns are used in the Latent Semantic Analysis. Finally empirical research is presented and based upon the quality measures of the text document classification, the effectiveness of the proposed approach is proved.
PL
W artykule skoncentrowano się na poprawie jakości klasyfikacji dokumentów tekstowych. Zostały przybliżone najpopularniejsze techniki analizy dokumentów tekstowych wykorzystywanych w klasyfikacji. Zwrócono uwagę na słabe strony opisanych technik. Omówiono możliwość integracji metod ilościowych i jakościowych analizy tekstu i jej wpływ na poprawę jakości klasyfikacji. Zaproponowano rozwiązanie, w którym rozbudowane wyrażenia otrzymane za pomocą wzorców informacyjnych są wykorzystywane w niejawnej analizie semantycznej. Ostatecznie w oparciu o miary jakości klasyfikacji dokumentów tekstowych zaprezentowano wyniki badań testowych, które potwierdzają skuteczność zaproponowanego rozwiązania.
EN
This article presents a method of the rough estimation of geographical coordinates of villages and cities, which is described in the 19th-Century geographical encyclopedia entitled: “The Geographical Dictionary of the Polish Kingdom and Other Slavic Countries”[18]. Described are the algorithm function for estimating location, the tools used to acquire and process necessary information, and the context of this research.
EN
To improve measurement precision and solve the question on projectile information extraction when projectile go through detection screen of photoelectric detection target, the wavelet analysis method was applied to process its information and look for its starting time in screen. The detection principle of photoelectric detection target was analyzed, the characteristic of wavelet analysis method and LMS adaptive filtering algorithm were used to research and analyze the output signal of photoelectric detection target. According to the output signal characteristic of photoelectric detection target, the wavelet transform modulus maxima theory and singularity position point were applied to search out signal’s start moment that projectile flying through detection screen, and ensure start moment and calculate time value between detection screens. Base on test velocity principle and experimentation, wavelet analysis method is compared with the traditional nose trigger extraction method, the precision of measuring velocity is less than 0.2%, which verifies wavelets analysis method to extract the photoelectric detection target detection information is feasible and correct.
PL
W artykule analizowane są metody analizy obrazu detekcji fotoelektrycznej w przypadku poruszającego się szybko obiektu, takiego jak n. pocisk. Do analizy wykorzystano transformatę falkową oraz algorytm filtrowania LMS.
EN
The data contained within user generated kontent websites prove to be valuable in many applications, for example in social media monitoring or in acquisition of training sets for machine learning algorithms. Mining such data is especially difficult in case of web forums, because of hundreds of various forum engines used. We propose an algorithm capable of unsupervised extraction of posts from social websites, without the need to analyse more than one page in advance. Our method localizes potential data regions by repetition analysis within document structure and filtering potential results. Subsequently, the fields of data records are fund using key characteristics and series-wide dependencies. We manager to achieve 85% precision of extraction and 79% recall after experiments on single pages taken from 258 websites. Our solution is characterized by high computing efficiency, thus enabling wide applications.
8
Content available Information extraction from chemical patents
EN
The development of new chemicals or pharmaceuticals is preceded by an indepth analysis of published patents in this field. This information retrieval is a costly and time inefficient step when done by a human reader, yet it is mandatory for potential success of an investment. The goal of the research project UIMA-HPC is to automate and hence speed-up the process of knowledge mining about patents. Multi-threaded analysis engines, developed according to UIMA (Unstructured Information Management Architecture) standards, process texts and images in thousands of documents in parallel. UNICORE (UNiform Interface to COmputing Resources) workflow control structures make it possible to dynamically allocate resources for every given task to gain best cpu-time/realtime ratios in an HPC environment.
EN
In this paper, we describe our work in progress in the scope of web-scale information extraction and information retrieval utilizing distributed computing. We present a distributed architecture built on top of the MapReduce paradigm for information retrieval, information processing and intelligent search supported by spatial capabilities. Proposed architecture is focused on crawling documents in several different formats, information extraction, lightweight semantic annotation of the extracted information, indexing of extracted information and finally on indexing of documents based on the geo-spatial information found in a document. We demonstrate the architecture on two use cases, where the first is search in job offers retrieved from the LinkedIn portal and the second is search in BBC news feeds and discuss several problems we had to face during the implemen-tation. We also discuss spatial search applications for both cases because both LinkedIn job offer pages and BBC news feeds contain a lot of spatial information to extract and process.
PL
W artykule opisano proces projektowania systemu ekstrakcji informacji SEI. Projektowanie tego systemu bazuje na regułach oraz zastosowaniu formalnej analizy pojęć do ich odpowiedniego ułożenia w bazie wiedzy opisywanego systemu.
EN
This article describes a design process of information extraction system IES. The proposed projecting method is based on rules and formal concept analysis.
EN
This paper describes the process for processing reports from rescue and firefighting. To reports processing methods and techniques used in the field of textual data mining (text mining). This paper also presents the classification and analysis methods section of text which is considered a potential use in the proposed process.
PL
W artykule zaprezentowano koncepcję stworzenia narzędzia wspomagającego wyszukiwanie informacji zgromadzonych w zasobach polskiego Internetu. Działa ono opierając się na systemie zbierającym i indeksującym dane oraz dedykowane gramatyki wyszukiwania, pozwalając efektywniej odnajdywać wartościowe informacje w sieci. Zaprezentowano przewagę prezentowanej koncepcji w porównaniu z rezultatami otrzymanymi przy użyciu wyszukiwarki Google dla przykładu z przemysłu tłoczniczego. Zaprezentowano także możliwości adaptacji systemu do innych gałęzi przemysłu oraz ewolucję jego wersji podstawowej.
EN
The paper presents the idea of an information extraction and search support system based on polish Web resources. System consist web crawling, data indexing and dedicated grammar syntax modules, which results with results quality improvement. As an usage example, it is presented stamp industry use case, compared to Google search results. Possible usage domains, improvement and evolution directions are shown in conclusion.
13
Content available Similarity-based web clip matching
EN
The research areas of extraction and integration of web data aim at delivery of tools and methods to extract pieces of information from third-party web sites and then to integrate them into profiled, domain-specific, custom web pages. Existing solutions rely on specialized APIs or XPath querying tools and are therefore not easily accessible to non technical end users. In this paper we describe our new comprehensive, non-XPath integration platform which allows end users to extract web page fragments using a simple query-by-example approach and then to combine these fragments into custom, integrated web pages. We focus on our two novel similarity-based web clip matching algorithms: Attribute Weights Tree Matching and Edit Distance Tree Matching.
EN
The paper presents a method of automatic construction of a semantically annotated corpus using the results of a rulebased information extraction (IE) application. Construction of the corpus is based on using existing programs for text tokenization and morphological analysis and combining their results with domain related correction rules. We reuse the specialized IE system to obtain a corpus annotated on the semantic level. The texts included within the corpus are Polish free text clinical data. We present the documents - diabetic patients' discharge records, the structure of the corpus annotation and the methods for obtaining the annotations. Initial evaluations based on the results of manual verification of selected data subset are also presented. The corpus, once manually corrected, is designed to be used for developing supervised machine learning models for IE applications.
PL
Tematem niniejszego artykułu jest przegląd metod i narzędzi służących reprezentacji i przetwarzaniu informacji, która jest aktualnie jednym z podstawowych środków budowania i zarządzania w każdej organizacji. Sprawne funkcjonowanie każdej instytucji uzależnione jest od dostępu do przechowywanej w niej wiedzy, jak również możliwości sprawnego jej wyszukiwania, systematyzowania i podejmowania na jej podstawie nowych decyzji.
EN
The theme of this article is to review methods and tools for representing and processing information, which is currently one of the principal means of building and management in any organization. The smooth functioning of any institution is dependent on access to knowledge stored in it, as well as the possibility of an efficient search, structuring and making the new decisions based on it.
16
Content available remote Cerberus: A New Information Retrieval Tool for Marine Metagenomics
EN
The number of papers published every year in scientific journals is growing tremendously, especially in biological sciences. Keeping the track of a given branch of science is therefore a difficult task. This was one of the reasons for developing the classification tool we called Cerberus. The classification categories may correspond to some areas of research defined by the user. We have used the tool to classify papers as containing marine metagenomic, terrestrial metagenomic or non-metagenomic information. Cerberus is based on special filters using weighted domain vocabularies. Depending on the number of occurrences of the keywords from the vocabularies in the paper, the program classifies the paper to a predefined category. This classification can precede the information extraction since it can reduce the number of papers to be analyzed. Classification of papers using the method we propose results in an accurate and precise result set of articles that are relevant to the scientist. This can reduce the resources needed to find the data required in ones field of studies.
EN
In this paper, we present the DANTE system, a tagger for temporal expressions in English documents. DANTE performs both recognition and normalization of the expressions in accordance with the TIMEX2 annotation standard. The system is built on modular principles, with a clear separation between the recognition and normalisation components. The interface between these components is based on our novel approach to representing the local semantics of temporal expressions. DANTE has been developed in two phases: first on the basis of the TIMEX2 guidelines alone, and then on the ACE 2005 development data. The system has been evaluated on the ACE 2005 and ACE 2007 data. Although this is still work in progress, we already achieve highly satisfactory results, both for the recognition of temporal expressions and their interpretation (normalisation).
EN
The ability to recognize human activities from sensory information is essential for developing the next generation of smart devices. Many human activity recognition tasks are - from a machine learning perspective-quite similar to tagging tasks in natural language processing. Motivated by this similarity, we develop a relational transformation-based tagging system based on inductive logic programming principles, which is able to cope with expressive relational representations as well as a background theory. The approach is experimentally evaluated on two activity recognition tasks and an information extraction task, and compared to Hidden Markov Models, one of the most popular and successful approaches for tagging.
19
EN
The paper focuses on resolving natural language issues which have been affecting performance of our system processing Polish medical data. In particular, we address phenomena such as ellipsis, anaphora, comparisons, coordination and negation occurring in mammogram reports. We propose practical data-driven solutions which allow us to improve the systems performance.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.