Ograniczanie wyników
Czasopisma help
Autorzy help
Lata help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 47

Liczba wyników na stronie
first rewind previous Strona / 3 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  zbiór danych
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 3 next fast forward last
PL
Artykuł skupia się na analizie danych z wykorzystaniem teorii zbiorów przybliżonych oraz różnych metod, takich jak algorytm genetyczny, klasyfikacja za pomocą zestawu reguł i metoda walidacji krzyżowej. Przedstawiono także kompletny proces analizy danych przy użyciu programu RSES. Wykorzystany zbiór danych oraz wyniki analizy zostałyomówione w kontekście teorii zbiorów przybliżonych. Artykuł kończy się podsumowaniem i wnioskamiskupiającymi się na aspekcie skuteczności wspomnianych metod w analizie zbioru danych oraz efektywności programu w kwestii przeprowadzania w nim analiz.
EN
The article focuses on data analysis using rough set theory and various methods such as the genetic algorithm, rule set classification and the cross-validation method. The complete data analysis process using RSES is also presented. The data set used and the results of the analysis are discussed in the context of rough set theory. The article concludes with a summary and conclusions focusing on the aspect of the effectiveness of aforementioned methods in analysing the dataset and the efficiency of the programin terms of performing analysis in it.
EN
Purpose: The main objective of this article is to identify areas for optimizing marketing communication via artificial intelligence solutions. Design/methodology/approach: In order to realise the assumptions made, an analysis and evaluation of exemplary implementations of AI systems in marketing communications was carried out. For the purpose of achieving the research objective, it was decided to choose the case study method. As part of the discussion, the considerations on the use of AI undertaken in world literature were analysed, as well as the analysis of three different practical projects. Findings: AI can contribute to the optimisation and personalisation of communication with the customer. Its application generates multifaceted benefits for both sides of the market exchange. Achieving them, however, requires a good understanding of this technology and the precise setting of objectives for its implementation. Research limitations/implications: The article contains a preliminary study. In the future it is planned to conduct additional quantitative and qualitative research. Practical implications: The conclusions of the study can serve to better understand the benefits of using artificial intelligence in communication with the consumer. The results of the research can be used both in market practice and also serve as an inspiration for further studies of this topic. Originality/value: The article reveals the specifics of artificial intelligence in relation to business activities and, in particular, communication with the buyer. The research used examples from business practice.
EN
Computer-Aided Sperm Analysis (CASA) is a widely studied topic in the diagnosis and treatment of male reproductive health. Although CASA has been evolving, there is still a lack of publicly available large-scale image datasets for CASA. To fill this gap, we provide the Sperm Videos and Images Analysis (SVIA) dataset, including three different subsets, subset-A, subset-B and subset-C, to test and evaluate different computer vision techniques in CASA. For subset-A, in order to test and evaluate the effectiveness of SVIA dataset for object detection, we use five representative object detection models and four commonly used evaluation metrics. For subset-B, in order to test and evaluate the effectiveness of SVIA dataset for image segmentation, we used eight representative methods and three standard evaluation metrics. Moreover, to test and evaluate the effectiveness of SVIA dataset for object tracking, we have employed the traditional kNN with progressive sperm (PR) as an evaluation metric and two deep learning models with three standard evaluation metrics. For subset-C, to prove the effectiveness of SVIA dataset for image denoising, nine denoising filters are used to denoise thirteen kinds of noise, and the mean structural similarity is calculated for evaluation. At the same time, to test and evaluate the effectiveness of SVIA dataset for image classification, we evaluate the results of twelve convolutional neural network models and six visual transformer models using four commonly used evaluation metrics. Through a series of experimental analyses and comparisons in this paper, it can be concluded that this proposed dataset can evaluate not only the functions of object detection, image segmentation, object tracking, image denoising, and image classification but also the robustness of object detection and image classification models. Therefore, SVIA dataset can fill the gap of the lack of large-scale public datasets in CASA and promote the development of CASA. Dataset is available at: .https://github.com/Demozsj/Detection-Sperm.
4
Content available Big Data i Data Mining w polskim budownictwie
PL
W artykule podjęto dyskusję nad występowaniem w polskim sektorze budownictwa bardzo dużych zasobów danych, określanych jako Big Data. W innych sektorach, np. finansowym czy usług, obserwuje się dostępność dużych baz danych i ich wykorzystanie w celu poprawy jakości usług, lepszego dostosowania się do wymagań klienta czy poprawy konkurencyjności na rynku. Sektor budowlany, a przede wszystkim jego produkt na tle innych dziedzin gospodarki cechuje specyfika. Czy jej występowanie powoduje brak zasobów Big Data? Artykuł wskazuje na występowanie zasobów Big Data w polskim budownictwie, możliwości i sposoby ich wykorzystania.
EN
The article discusses the work in Polish for the construction of very large data resources, referred to as Big Data. In other sectors, e.g. financial or services, the availability of the database market and their use to improve the quality of services is observed, a test version for customer testing or improvement of market competitiveness. The construction sector, and above all its product, is specific compared to other sectors of the economy. Does its occurrence result in a lack of Big Data resources? An article on the occurrence of Big Data resources in Polish construction, possibilities and ways of using them.
EN
Background: This paper has the central aim to provide an analysis of increases of system complexity in the context of modern industrial information systems. An investigation and exploration of relevant theoretical frameworks is conducted and accumulates in the proposition of a set of hypotheses as an explanatory approach for a possible definition of system complexity based on information growth in industrial information systems. Several interconnected sources of technological information are investigated and explored in the given context in their functionality as information transferring agents, and their practical relevance is underlined by the application of the concepts of Big Data and cyber-physical, cyber-human and cyber-physical-cyber-human systems. Methods: A systematic review of relevant literature was conducted for this paper and in total 85 sources matching the scope of this article, in the form of academic journals and academic books of the mentioned academic fields, published between 2012 and 2019, were selected, individually read and reviewed by the authors and reduced by careful author selection to 17 key sources which served as the basis for theory synthesis. Results: Four hypotheses (H1-H4) concerning exponential surges of system complexity in industrial information systems are introduced. Furthermore, first foundational ideas for a possible approach to potentially describe, model and simulate complex industrial information systems based on network, agent-based approaches and the concept of Shannon entropy are introduced. Conclusion: Based on the introduced hypotheses it can be theoretically indicated that the amount information aggregated and transferred in a system can serve as an indicator for the development of system complexity and as a possible explanatory concept for the exponential surges of system complexity in industrial information systems.
EN
Big data, artificial intelligence and the Internet of things (IoT) are still very popular areas in current research and industrial applications. Processing massive amounts of data generated by the IoT and stored in distributed space is not a straightforward task and may cause many problems. During the last few decades, scientists have proposed many interesting approaches to extract information and discover knowledge from data collected in database systems or other sources. We observe a permanent development of machine learning algorithms that support each phase of the data mining process, ensuring achievement of better results than before. Rough set theory (RST) delivers a formal insight into information, knowledge, data reduction, uncertainty, and missing values. This formalism, formulated in the 1980s and developed by several researches, can serve as a theoretical basis and practical background for dealing with ambiguities, data reduction, building ontologies, etc. Moreover, as a mature theory, it has evolved into numerous extensions and has been transformed through various incarnations, which have enriched expressiveness and applicability of the related tools. The main aim of this article is to present an overview of selected applications of RST in big data analysis and processing. Thousands of publications on rough sets have been contributed; therefore, we focus on papers published in the last few years. The applications of RST are considered from two main perspectives: direct use of the RST concepts and tools, and jointly with other approaches, i.e., fuzzy sets, probabilistic concepts, and deep learning. The latter hybrid idea seems to be very promising for developing new methods and related tools as well as extensions of the application area.
EN
This work presents an original model for detecting machine tool anomalies and emergency states through operation data processing. The paper is focused on an elastic hierarchical system for effective data reduction and classification, which encompasses several modules. Firstly, principal component analysis (PCA) is used to perform data reduction of many input signals from big data tree topology structures into two signals representing all of them. Then the technique for segmentation of operating machine data based on dynamic time distortion and hierarchical clustering is used to calculate signal accident characteristics using classifiers such as the maximum level change, a signal trend, the variance of residuals, and others. Data segmentation and analysis techniques enable effective and robust detection of operating machine tool anomalies and emergency states due to almost real-time data collection from strategically placed sensors and results collected from previous production cycles. The emergency state detection model described in this paper could be beneficial for improving the production process, increasing production efficiency by detecting and minimizing machine tool error conditions, as well as improving product quality and overall equipment productivity. The proposed model was tested on H-630 and H-50 machine tools in a real production environment of the Tajmac-ZPS company.
EN
The process of garment production has always been a black box. The production time of different clothing is different and has great changes, thus managers cannot make a production plan accurately. With the world entering the era of industry 4.0 and the accumulation of big data, machine learning can provide services for the garment manufacturing industry. The production cycle time is the key to control the production process. In order to predict the production cycle time more accurately and master the production process in the garment manufacturing process, a neural network model of production cycle time prediction is established in this paper. Using a trained neural network to predict the production cycle time, the overall error of 6 groups is within 5%, and that of 3 groups is between 5% and 10%. Therefore, this neural network can be used to predict the future production cycle time and predict the overall production time of clothing.
PL
Czas produkcji różnych ubrań jest inny i podlega dużym zmianom, dlatego menedżerowie nie mogą dokładnie zaplanować produkcji. Wraz z wkroczeniem świata w erę przemysłu 4.0 i gromadzeniem dużych zbiorów danych dobrym rozwiązaniem dla przemysłu odzieżowego jest zastosowanie maszyn uczących się. Czas cyklu produkcyjnego jest kluczem do kontroli procesu produkcyjnego. W celu dokładniejszego przewidywania czasu cyklu produkcyjnego i opanowania procesu produkcyjnego w procesie produkcji odzieży, w artykule opracowano model sieci neuronowej do przewidywania czasu cyklu produkcyjnego. Do przewidywania czasu cyklu produkcyjnego użyto sieci neuronowej, ogólny błąd 6 grup mieścił się w granicach 5%, a 3 grup – między 5% a 10%. W związku z tym zaprezentowana sieć neuronowa może znaleźć zastosowanie w przewidywaniu czasu cyklu produkcyjnego i całkowitego czasu produkcji odzieży.
EN
Current advances in high-throughput and imaging technologies are paving the way next-generation healthcare, tailored to the clinical and molecular characteristics of each patient. The Big Data obtained from these technologies are of little value to society unless it can be analyzed, interpreted, and applied in a relatively customized and inexpensive way.We propose a flexible decision support system called IntelliOmics for multi-omics data analysis constituted with well-designed and maintained components with open license for both personal and commercial use. Our proposition aims to serve some insight how to build your own local end-to-end service towards personalized medicine: from raw data upload, intelligent integration and exploration to detailed analysis accompanying clinical medical reports. The high-throughput data is effectively collected and processed in a parallel and distributed manner using the Hadoop framework and user-defined scripts. Heterogeneous data transformation performed mainly on the Apache Hive is then integrated into a so called ‘knowledge base’. On its basis, manual analysis in the form of hierarchical rules can be performed as well as automatic data analysis with Apache Spark and machine learning library MLlib. Finally, diagnostic and prognostic tools, charts, tables, statistical tests and print-ready clinical reports for an individual or group of patients are provided. The experimental evaluation was performed as part of the clinical decision support for targeted therapy in non-small cell lung cancer. The system managed to successfully process over a hundred of multi-omic patient data and offers various functionalities for different types of users: researchers, bio-statisticians/bioinformaticians, clinicians and medical board.
EN
Machine learning algorithms have become popular in diabetes research, especially within the scope of glucose prediction from continuous glucose monitoring (CGM) data. We investigated the design choices in case-based reasoning (CBR) approach to glucose prediction from the CGM data. Design choices were made with regards to the distance function (city-block, Euclidean, cosine, Pearson’s correlation), number of observations, and adaptation of the solution (average, weighted average, linear regression) used in the model, and were evaluated using five-fold cross-validation to establish the impact of each choice to the prediction error. Our best models showed mean absolute error of 13.35 ± 3.04 mg/dL for prediction horizon PH = 30 min, and 30.23 ± 6.50 mg/dL for PH = 60 min. The experiments were performed using the data of 20 subjects recorded in free-living conditions. The problem of using small datasets to test blood glucose prediction models and assess the prediction error of the model was also addressed in this paper. We proposed for the first time the methodology for estimation of the impact of the number of subjects (i.e., dataset size) on the distribution of the prediction error of the model. The proposed methodology is based on Monte Carlo cross-validation with the systematic reduction of subjects in the dataset. The implementation of the methodology was used to gauge the change in the prediction error when the number of subjects in the dataset increases, and as such allows the projection on the prediction error in case the dataset is extended with new subjects.
11
Content available remote Query Specific Focused Summarization of Biomedical Journal Articles
EN
During COVID-19, a large repository of relevant literature, termed as``CORD-19'', was released by Allen Instituteof AI. The repository being very large, and growing exponentially, concerned users are struggling to retrieve only required information from the documents. In this paper, we present a framework for generating focused summaries of journal articles. The summary is generated using a novel optimization mechanism to ensure that it definitely contains all essential scientific content. The parameters for summarization are drawn from the variables that are used for reporting scientific studies. We have evaluated our results on the CORD-19 dataset. The approach however is generic.
12
Content available remote Wstępne przetwarzanie danych
PL
Zbiory danych mogą stać się cennym źródłem wiedzy. Aby tak się jednak stało, musimy we właściwy sposób podejść do ich analizy. Proces analizy danych składa się z kilku etapów (opisanych w nr. 1/2020 „Utrzymania Ruchu” w artykule Analiza dużych zbiorów danych). Kluczowym etapem jest etap wstępnego przetwarzania danych, który bezpośrednio poprzedza etap eksploracji i jest często etapem najbardziej pracochłonnym.
EN
The current age characterized by unstoppable progress and rapid development of new technologies and methods such as the Internet of Things, machine learning and artificial intelligence, brings new requirements for enterprise information systems. Information systems ought to be a consistent set of elements that provide a basis for information that could be used in context to obtain knowledge. To generate valid knowledge, information must be based on objective and actual data. Furthermore, due to Industry 4.0 trends such as digitalization and online process monitoring, the amount of data produced is constantly increasing – in this context the term Big Data is used. The aim of this article is to point out the role of Big Data within Industry 4.0. Nevertheless, Big Data could be used in a much wider range of business areas, not just in industry. The term Big Data encompasses issues related to the exponentially growing volume of produced data, their variety and velocity of their origin. These characteristics of Big Data are also associated with possible processing problems. The article also focuses on the issue of ensuring and monitoring the quality of data. Reliable information cannot be inferred from poor quality data and the knowledge gained from such information is inaccurate. The expected results do not appear in such a case and the ultimate consequence may be a loss of confidence in the information system used. On the contrary, it could be assumed that the acquisition, storage and use of Big Data in the future will become a key factor to maintaining competitiveness, business growth and further innovations. Thus, the organizations that will systematically use Big Data in their decision-making process and planning strategies will have a competitive advantage.
PL
W artykule opisano zagadnienie odróżniania historycznych fotografii pomiędzy oryginalnie kolorowe a koloryzowane. Rozważono problem doboru zdjęć pod względem technologii, w jakiej zostały wykonane. Następnie wykorzystując sieci neuronowe już w części wyuczone na innych zbiorach danych, sprawdzono ich efektywność w rozwiązywaniu badanego problemu. Rozważono wpływ rozmiaru obrazu podanego na wejściu, architektury zastosowanej sieci, a także zestawu danych użytego do uczenia sieci i wyodrębniania cech. W rezultacie potwierdzono przydatność opracowanego zbioru do treningu sieci, a także zaobserwowano, że zwiększanie rozmiaru sieci nie przynosi dodatkowych korzyści. Uzyskana trafność rozróżniania sięgnęła ponad 92 %.
EN
The article describes a dataset designed to train neural networks distinguishing historical photographs between the ones that have original historic color and those which were contemporary colorized. The problem of choosing photos in terms of technology and content was considered. Using some of the pre-trained neural networks on other collections, their effectiveness in solving the studied issue was checked. The influence of the input image size, the depth of the neural network used as well as the data set used to train the network to extract features was investigated. As a result, the usefulness of the developed set for network training was confirmed, and it was observed that increasing the network did not bring any additional benefits. The reached accuracy is up to 92.6%.
EN
This study was designed to presents concise review of a novel subject regarding the use of large data sets (Big Data) which generates the functioning of the power system and their use to improve the operation and economic benefits of Smart Grids. Thanks to smart metering, we have current access to the data on the use of resources, which then using SCADA system and servers that support large data sets such as Apache Hadoop or Spark can be stored. Afterwards, these data are used for predictive calculations that are extremely important from an economic point of view. At the end of the paper, an interesting proposition of research is given by Author, namely to use, as ancillary information, the satellite data obtained from the Copernicus Programme provided by the European Space Agency ESA related for example with temperature to forecast energy consumption in electricity transmission and distribution networks.
PL
Praca ta zawiera zwięzły przegląd bardzo świeżej tematyki dotyczącej zagadnień wykorzystania dużych zbiorów danych (Big Data) jakie generuje funkcjonowanie systemu elektroenergetycznego i użycie ich do ulepszania działania i ekonomicznych korzyści w tychże systemach typu Smart Grids. Dzięki inteligentnemu opomiarowaniu mamy bieżący dostęp do danych dotyczących wykorzystania zasobów, które następnie za pomocą systemu SCADA oraz serwerów obsługujących duże zbiory danych jak np. Apache Hadoop czy Spark mogą zostać składowane i następnie wykorzystane do obliczeń predykcyjnych niezmiernie istotnych chociażby z ekonomicznego punktu widzenia. Ponadto ciekawą propozycją Autora jest wykorzystanie jako informacji pomocniczych danych satelitarnych z Programu Copernicus udostępnianych przez Europejską Agencję Kosmiczną ESA związanych przykładowo z temperaturą do prognoz zużycia energii w sieci energetycznej.
EN
In this article, we discuss the implementation of a quantum recommendation system that uses a quantum variant of the k-nearest neighbours algorithm and the Grover algorithm to search for a specific element in an unstructured database. In addition to the presentation of the recommendation system as an algorithm, the article also shows the main steps in construction of a suitable quantum circuit for realisation of a given recommendation system. The computational complexity of individual calculation steps in the recommendation system is also indicated. The verification of the correctness of the proposed system is analysed as well, indicating an algebraic equation describing the probability of success of the recommendation. The article also shows numerical examples presenting the behaviour of the recommendation system for two selected cases.
EN
Extracting useful information from astronomical observations represents one of the most challenging tasks of data exploration. This is largely due to the volume of the data acquired using advanced observational tools. While other challenges typical for the class of big data problems (like data variety) are also present, the size of datasets represents the most significant obstacle in visualization and subsequent analysis. This paper studies an efficient data condensation algorithm aimed at providing its compact representation. It is based on fast nearest neighbor calculation using tree structures and parallel processing. In addition to that, the possibility of using approximate identification of neighbors, to even further improve the algorithm time performance, is also evaluated. The properties of the proposed approach, both in terms of performance and condensation quality, are experimentally assessed on astronomical datasets related to the GAIA mission. It is concluded that the introduced technique might serve as a scalable method of alleviating the problem of the dataset size.
EN
The useful life time of equipment is an important variable related to system prognosis, and its accurate estimation leads to several competitive advantage in industry. In this paper, Remaining Useful Lifetime (RUL) prediction is estimated by Particle Swarm optimized Support Vector Machines (PSO+SVM) considering two possible pre-processing techniques to improve input quality: Empirical Mode Decomposition (EMD) and Wavelet Transforms (WT). Here, EMD and WT coupled with SVM are used to predict RUL of bearing from the IEEE PHM Challenge 2012 big dataset. Specifically, two cases were analyzed: considering the complete vibration dataset and considering truncated vibration dataset. Finally, predictions provided from models applying both pre-processing techniques are compared against results obtained from PSO+SVM without any pre-processing approach. As conclusion, EMD+SVM presented more accurate predictions and outperformed the other models.
PL
Okres użytkowania sprzętu jest ważną zmienną związaną z prognozowaniem pracy systemu, a możliwość jego dokładnej oceny daje zakładom przemysłowym znaczną przewagę konkurencyjną. W tym artykule pozostały czas pracy (Remaining Useful Life, RUL) szacowano za pomocą maszyn wektorów nośnych zoptymalizowanych rojem cząstek (SVM+PSO) z uwzględnieniem dwóch technik przetwarzania wstępnego pozwalających na poprawę jakości danych wejściowych: empirycznej dekompozycji sygnału (Empirical Mode Decomposition, EMD) oraz transformat falkowych (Wavelet Transforms, WT). W niniejszej pracy, EMD i falki w połączeniu z SVM wykorzystano do prognozowania RUL łożyska ze zbioru danych IEEE PHM Challenge 2012 Big Dataset. W szczególności, przeanalizowano dwa przypadki: uwzględniający kompletny zestaw danych o drganiach oraz drugi, biorący pod uwagę okrojoną wersję tego zbioru. Prognozy otrzymane na podstawie modeli, w których zastosowano obie techniki przetwarzania wstępnego porównano z wynikami uzyskanymi za pomocą PSO + SVM bez wstępnego przetwarzania danych. Wyniki pokazały, że model EMD + SVM generował dokładniejsze prognozy i tym samym przewyższał pozostałe badane modele.
19
Content available remote Urban sound classification using long short-term memory neural network
EN
Environmental sound classification has received more attention in recent years. Analysis of environmental sounds is difficult because of its unstructured nature. However, the presence of strong spectro-temporal patterns makes the classification possible. Since LSTM neural networks are efficient at learning temporal dependencies we propose and examine a LSTM model for urban sound classification. The model is trained on magnitude mel-spectrograms extracted from UrbanSound8K dataset audio. The proposed network is evaluated using 5-fold cross-validation and compared with the baseline CNN. It is shown that the LSTM model outperforms a set of existing solutions and is more accurate and confident than the CNN.
20
EN
Medical data is being used for huge number of research works over the globe which is for predicting something novel case studies in each work. The current research which we are handling is on utilizing the EHR (Electronic health Records) data in an efficient way based on the cause - effect ratio and the variables available for the data manipulation, processing and generating efficient data for designing efficient prediction models. In this research we are focusing on the congenital tethered cord syndrome through which some many functional outcomes issues are recording in different cases and there is a wide range of scope for research. In this research we are identifying the data from different EHR applications and designing the architecture to gather valuable data set from those for designing prediction model for predicting functional outcomes of health and life in patients with congenital deformity. Through EHR applications we gather information and BigData is being created in this sector. Data inter -relation is explained in this survey article in an efficient way with respect to medical domain. EHR data will be hosted over the cloud and in public repositories. Will focus on those categories in an efficient manner.
first rewind previous Strona / 3 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.