Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 16

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  zbiór danych
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
1
100%
PL
Zgodnie z moją deklaracją przygotowania dla „Utrzymania Ruchu" drugiej części artykułu Gospodarka zasobami danych w zarządzaniu eksploatacją i utrzymaniem ruchu, który ukazał się w poprzednim (1/2013) numerze UR, przedstawiam - w moim odczuciu - kwestie szczegółowe, których z braku miejsca nie omówiłem poprzednio.
2
Content available remote Wstępne przetwarzanie danych
100%
PL
Zbiory danych mogą stać się cennym źródłem wiedzy. Aby tak się jednak stało, musimy we właściwy sposób podejść do ich analizy. Proces analizy danych składa się z kilku etapów (opisanych w nr. 1/2020 „Utrzymania Ruchu” w artykule Analiza dużych zbiorów danych). Kluczowym etapem jest etap wstępnego przetwarzania danych, który bezpośrednio poprzedza etap eksploracji i jest często etapem najbardziej pracochłonnym.
4
86%
|
|
tom z. 4
103--112
EN
In the paper we discuss performance of classic bubble sort algorithm for large data sets. Research results discussed and described in this article help to evaluate computer methods used in NoSQL database systems for large amounts of the input data. Therefore we try to analyze one of the most common sorting algorithms and its properties for large data sets.
PL
Artykuł ma na celu przedstawienie analizy wydajności algorytmu sortowania bąbelkowego w postaci klasycznej dla dużych zbiorów danych. Podjęty temat ma duże znaczenie dla rozwoju współczesnej informatyki ze względu na to, że komputery muszą pracować na coraz większych ilościach danych.
EN
The current age characterized by unstoppable progress and rapid development of new technologies and methods such as the Internet of Things, machine learning and artificial intelligence, brings new requirements for enterprise information systems. Information systems ought to be a consistent set of elements that provide a basis for information that could be used in context to obtain knowledge. To generate valid knowledge, information must be based on objective and actual data. Furthermore, due to Industry 4.0 trends such as digitalization and online process monitoring, the amount of data produced is constantly increasing – in this context the term Big Data is used. The aim of this article is to point out the role of Big Data within Industry 4.0. Nevertheless, Big Data could be used in a much wider range of business areas, not just in industry. The term Big Data encompasses issues related to the exponentially growing volume of produced data, their variety and velocity of their origin. These characteristics of Big Data are also associated with possible processing problems. The article also focuses on the issue of ensuring and monitoring the quality of data. Reliable information cannot be inferred from poor quality data and the knowledge gained from such information is inaccurate. The expected results do not appear in such a case and the ultimate consequence may be a loss of confidence in the information system used. On the contrary, it could be assumed that the acquisition, storage and use of Big Data in the future will become a key factor to maintaining competitiveness, business growth and further innovations. Thus, the organizations that will systematically use Big Data in their decision-making process and planning strategies will have a competitive advantage.
6
Content available remote Urban sound classification using long short-term memory neural network
72%
EN
Environmental sound classification has received more attention in recent years. Analysis of environmental sounds is difficult because of its unstructured nature. However, the presence of strong spectro-temporal patterns makes the classification possible. Since LSTM neural networks are efficient at learning temporal dependencies we propose and examine a LSTM model for urban sound classification. The model is trained on magnitude mel-spectrograms extracted from UrbanSound8K dataset audio. The proposed network is evaluated using 5-fold cross-validation and compared with the baseline CNN. It is shown that the LSTM model outperforms a set of existing solutions and is more accurate and confident than the CNN.
EN
In this paper we discuss the evaluation of neural networks in accordance with medical image classification and analysis. We also summarize the existing databases with images which could be used for training deep models that can be later utilized in remote home-based health care systems. In particular, we propose methods for remote video-based estimation of patient vital signs and other health-related parameters. Additionally, potential challenges of using, storing and transferring sensitive patient data are discussed.
|
|
tom Vol. 61, nr 1
131--145
EN
Missing traffic data is an important issue for road administration. Although numerous ways can be found to impute them in foreign literature (inter alia, the most effective method, that is Box-Jenkins models), in Poland, still only proven and simplified methods are applied. The article presents the analyses including an assessment of the completeness of the existing traffic data and works related to the construction of SARIMA model. The study was conducted on the basis of hourly traffic volumes, derived from the continuous traffic counts stations located in the national road network in Poland (Golden River stations) from the years 2005 – 2010. As a result, the proposed model was used to impute the missing data in the form of SARIMA (1.1,1)(0,1,1)168. The newly developed model can be used effectively to fill in the missing required days of measurement for estimating AADT by AASHTO method. In other cases, due to its accuracy and laboriousness of the process, it is not recommended.
EN
Current advances in high-throughput and imaging technologies are paving the way next-generation healthcare, tailored to the clinical and molecular characteristics of each patient. The Big Data obtained from these technologies are of little value to society unless it can be analyzed, interpreted, and applied in a relatively customized and inexpensive way.We propose a flexible decision support system called IntelliOmics for multi-omics data analysis constituted with well-designed and maintained components with open license for both personal and commercial use. Our proposition aims to serve some insight how to build your own local end-to-end service towards personalized medicine: from raw data upload, intelligent integration and exploration to detailed analysis accompanying clinical medical reports. The high-throughput data is effectively collected and processed in a parallel and distributed manner using the Hadoop framework and user-defined scripts. Heterogeneous data transformation performed mainly on the Apache Hive is then integrated into a so called ‘knowledge base’. On its basis, manual analysis in the form of hierarchical rules can be performed as well as automatic data analysis with Apache Spark and machine learning library MLlib. Finally, diagnostic and prognostic tools, charts, tables, statistical tests and print-ready clinical reports for an individual or group of patients are provided. The experimental evaluation was performed as part of the clinical decision support for targeted therapy in non-small cell lung cancer. The system managed to successfully process over a hundred of multi-omic patient data and offers various functionalities for different types of users: researchers, bio-statisticians/bioinformaticians, clinicians and medical board.
EN
Computer-Aided Sperm Analysis (CASA) is a widely studied topic in the diagnosis and treatment of male reproductive health. Although CASA has been evolving, there is still a lack of publicly available large-scale image datasets for CASA. To fill this gap, we provide the Sperm Videos and Images Analysis (SVIA) dataset, including three different subsets, subset-A, subset-B and subset-C, to test and evaluate different computer vision techniques in CASA. For subset-A, in order to test and evaluate the effectiveness of SVIA dataset for object detection, we use five representative object detection models and four commonly used evaluation metrics. For subset-B, in order to test and evaluate the effectiveness of SVIA dataset for image segmentation, we used eight representative methods and three standard evaluation metrics. Moreover, to test and evaluate the effectiveness of SVIA dataset for object tracking, we have employed the traditional kNN with progressive sperm (PR) as an evaluation metric and two deep learning models with three standard evaluation metrics. For subset-C, to prove the effectiveness of SVIA dataset for image denoising, nine denoising filters are used to denoise thirteen kinds of noise, and the mean structural similarity is calculated for evaluation. At the same time, to test and evaluate the effectiveness of SVIA dataset for image classification, we evaluate the results of twelve convolutional neural network models and six visual transformer models using four commonly used evaluation metrics. Through a series of experimental analyses and comparisons in this paper, it can be concluded that this proposed dataset can evaluate not only the functions of object detection, image segmentation, object tracking, image denoising, and image classification but also the robustness of object detection and image classification models. Therefore, SVIA dataset can fill the gap of the lack of large-scale public datasets in CASA and promote the development of CASA. Dataset is available at: .https://github.com/Demozsj/Detection-Sperm.
|
|
tom 11
|
nr 13 (2)
57-71
EN
The processing of personal data revealing the religious beliefs or religious shall be prohibited. Processing of the data revealing the religious beliefs or religious shall not constitute a breach of the Act on the Protection of Personal Data where processing is necessary for the purposes of carrying out the statutory objectives of the Catholic Church and provided that the processing relates solely to the members of the Church.
PL
Nie można przetwarzać danych osobowych ujawniających przekonania religijne czy przynależność wyznaniową. Przetwarzanie danych ujawniających przekonania religijne czy przynależność wyznaniową nie stanowi naruszenia ustawy o ochronie danych osobowych, jeżeli jest to niezbędne do wykonania statutowych zadań Kościoła Katolickiego i dotyczy wyłącznie członków Kościoła.
12
Content available Big Data i Data Mining w polskim budownictwie
58%
|
|
tom R. 92, nr 7-8
150--153
PL
W artykule podjęto dyskusję nad występowaniem w polskim sektorze budownictwa bardzo dużych zasobów danych, określanych jako Big Data. W innych sektorach, np. finansowym czy usług, obserwuje się dostępność dużych baz danych i ich wykorzystanie w celu poprawy jakości usług, lepszego dostosowania się do wymagań klienta czy poprawy konkurencyjności na rynku. Sektor budowlany, a przede wszystkim jego produkt na tle innych dziedzin gospodarki cechuje specyfika. Czy jej występowanie powoduje brak zasobów Big Data? Artykuł wskazuje na występowanie zasobów Big Data w polskim budownictwie, możliwości i sposoby ich wykorzystania.
EN
The article discusses the work in Polish for the construction of very large data resources, referred to as Big Data. In other sectors, e.g. financial or services, the availability of the database market and their use to improve the quality of services is observed, a test version for customer testing or improvement of market competitiveness. The construction sector, and above all its product, is specific compared to other sectors of the economy. Does its occurrence result in a lack of Big Data resources? An article on the occurrence of Big Data resources in Polish construction, possibilities and ways of using them.
EN
This work presents an original model for detecting machine tool anomalies and emergency states through operation data processing. The paper is focused on an elastic hierarchical system for effective data reduction and classification, which encompasses several modules. Firstly, principal component analysis (PCA) is used to perform data reduction of many input signals from big data tree topology structures into two signals representing all of them. Then the technique for segmentation of operating machine data based on dynamic time distortion and hierarchical clustering is used to calculate signal accident characteristics using classifiers such as the maximum level change, a signal trend, the variance of residuals, and others. Data segmentation and analysis techniques enable effective and robust detection of operating machine tool anomalies and emergency states due to almost real-time data collection from strategically placed sensors and results collected from previous production cycles. The emergency state detection model described in this paper could be beneficial for improving the production process, increasing production efficiency by detecting and minimizing machine tool error conditions, as well as improving product quality and overall equipment productivity. The proposed model was tested on H-630 and H-50 machine tools in a real production environment of the Tajmac-ZPS company.
|
2016
|
tom Vol. 64, no. 5
1731--1754
EN
Droughts are natural phenomena affecting the environment and human activities. There are various drought definitions and quantitative indices; among them is the Standardised Precipitation Index (SPI). In the drought investigations, historical events are poorly characterised and little data are available. To decipher past drought appearances in the southeastern Alps with a focus on Slovenia, precipitation data from HISTALP data repository were taken to identify extreme drought events (SPI ≤ -2.00) from the second half of the 19th century to the present day. Several long-term extreme drought crises were identified in the region (between the years 1888 and 1896; after World War I, during and after World War II). After 1968, drought patterns detected with SPI changed: shorter, extreme droughts with different time patterns appeared. SPI indices of different time spans showed correlated structures in space and between each other, indicating structured relations.
EN
When running data-mining algorithms on big data platforms, a parallel, distributed framework, such as MAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method) algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.
16
Content available remote Query Specific Focused Summarization of Biomedical Journal Articles
58%
|
|
tom Vol. 25
91--100
EN
During COVID-19, a large repository of relevant literature, termed as``CORD-19'', was released by Allen Instituteof AI. The repository being very large, and growing exponentially, concerned users are struggling to retrieve only required information from the documents. In this paper, we present a framework for generating focused summaries of journal articles. The summary is generated using a novel optimization mechanism to ensure that it definitely contains all essential scientific content. The parameters for summarization are drawn from the variables that are used for reporting scientific studies. We have evaluated our results on the CORD-19 dataset. The approach however is generic.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.