Over the last two decades, the size and amount of data has increased enormously, whichhas changed traditional methods of data management and introduced two new technolog-ical terms: big data and cloud computing. Addressing big data, characterized by massivevolume, high velocity and variety, is quite challenging as it requires large computationalinfrastructure to store, process and analyze it. A reliable technique to carry out sophisti-cated and enormous data processing has emerged in the form of cloud computing becauseit eliminates the need to manage advanced hardware and software, and offers various ser-vices to users. Presently, big data and cloud computing are gaining significant interestamong academia as well as in industrial research. In this review, we introduce variouscharacteristics, applications and challenges of big data and cloud computing. We providea brief overview of different platforms that are available to handle big data, including theircritical analysis based on different parameters. We also discuss the correlation betweenbig data and cloud computing. We focus on the life cycle of big data and its vital analysisapplications in various fields and domains At the end, we present the open research issuesthat still need to be addressed and give some pointers to future scholars in the fields ofbig data and cloud computing.
Distresses are integral parts of pavement that occur during the life of the road. Bitumen distress is known as one of the most important problems of Iran's roads, especially in tropical areas and transit routes with heavy axes; so, identifying the effective factors in creating the bleeding phenomenon is very necessary and important. Therefore, this study was conducted to investigate the parameters of the mixing design in creation of bleeding phenomenon and its severity. The collected data were then analyzed and grouped using Design Expert and SPSS software. The results show that all five parameters of optimal bitumen percent, bitumen percent in asphalt mixture, void percent of Marshall Sample, percent void and filler to bitumen ratio are effective on bitumen and its intensity. Among the mentioned parameters, two parameters of percent of bitumen compared to asphalt mixture and the void percent in the Marshall sample have a greater effect on the severity of the bleeding phenomenon.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Analiza danych jest ciągle rozwijającym się procesem, składającym się z wielu etapów. W artykule przedstawiony został główny etap – eksplorowanie danych. Jak wybrać odpowiednią metodę eksploracji danych?
An electricity theft is a problem for distribution system operators (DSOs) in Poland. DSOs use many ways to limit this unfavourable phenomenon. Within this paper, the author presents a new method to detect the location of illegal power consumption. The method bases on the processing of data from an advanced metering infrastructure (AMI). The method is based on the observation that some consumers illegally consume energy mainly in the winter season and that the level of illegal energy consumption may depend on the level of energy consumption. The method searches for periods of temporary reduction of the balance difference and a simultaneous decrease in energy consumption by one of the consumers.
PL
Kradzieże energii elektrycznej są problemem dla operatorów systemów dystrybucyjnych (OSD) W Polsce. OSD stosują wiele sposobów ograniczania tego niekorzystnego zjawiska. W artykule autor przedstawia nową metodę wykrywania miejsc nielegalnego poboru energii. Metoda opiera się na przetwarzaniu danych z zaawansowanej infiastruktury pomiarowej (AMI). Metoda bazuje na spostrzeżeniu, że część odbiorców nielegalnie pobiera energię głównie W sezonie zimowym oraz że poziom nielegalnego poboru energii może zależeć od poziomu zużycia energii. Metoda poszukuje okresów czasowego zmniejszenia różnicy bilansowej i występującego W tym samym czasie spadku zużycia energii przez jednego z odbiorców.
Purpose: The aim of the article is to describe and forecast possible difficulties related to the development of cognitive technologies and the progressing of algorithmization of HRM processes as a part of Industry 4.0. Design/methodology/approach: While most of the studies to date related to the phenomenon of Industry 4.0 and Big Data are concerned with the level of efficiency of cyber-physical systems and the improvement of algorithmic tools, this study proposes a different perspective. It is an attempt to foresee the possible difficulties connected with algorithmization HRM processes, which understanding could help to "prepare" or even eliminate the harmful effects we may face which will affect decisions made in the field of the managing organizations, especially regarding human resources management, in era of Industry 4.0. Findings: The research of cognitive technologies in the broadest sense is primarily associated with a focus of thinking on their effectiveness, which can result in a one-sided view and ultimately a lack of objective assessment of that effectiveness. Therefore, conducting a parallel critical reflection seems even necessary. This reflection has the potential to lead to a more balanced assessment of what is undoubtedly "for", but also of what may be "against". The proposed point of view may contribute to a more informed use of algorithm-based cognitive technologies in the human resource management process, and thus to improve their real-world effectiveness. Social implications: The article can have an educational function, helps to develop critical thinking about cognitive technologies, and directs attention to areas of knowledge by which future skills should be extended. Originality/value: This article is addressed to all those who use algorithms and data-driven decision-making processes in HRM. Crucial in these considerations is the to draw attention to the dangers of unreflective use of technical solutions supporting HRM processes. The novelty of the proposed approach is the identification of three potential risk areas that may result in faulty HR decisions. These include the risk of "technological proof of equity", overconfidence in the objective character of algorithms and the existence of a real danger resulting from the so-called algorithm overfitting. Recognition of these difficulties ultimately contributed to real improvements in productivity by combining human performance with technology effectiveness.
The aluminum profile extrusion process is briefly characterized in the paper, together with the presentation of historical, automatically recorded data. The initial selection of the important, widely understood, process parameters was made using statistical methods such as correlation analysis for continuous and categorical (discrete) variables and ‘inverse’ ANOVA and Kruskal–Wallis methods. These selected process variables were used as inputs for MLP-type neural models with two main product defects as the numerical outputs with values 0 and 1. A multi-variant development program was applied for the neural networks and the best neural models were utilized for finding the characteristic influence of the process parameters on the product quality. The final result of the research is the basis of a recommendation system for the significant process parameters that uses a combination of information from previous cases and neural models.
Approximately 30 million tons of tailings are being stored each year at the KGHMs Zelazny Most Tailings Storage Facility (TSF). Covering an area of almost 1.6 thousand hectares, and being surrounded by dams of a total length of 14 km and height of over 70 m in some areas, makes it the largest reservoir of post-flotation tailings in Europe and the second-largest in the world. With approximately 2900 monitoring instruments and measuring points surrounding the facility, Zelazny Most is a subject of round-the-clock monitoring, which for safety and economic reasons is crucial not only for the immediate surroundings of the facility but for the entire region. The monitoring network can be divided into four main groups: (a) geotechnical, consisting mostly of inclinometers and VW pore pressure transducers, (b) hydrological with piezometers and water level gauges, (c) geodetic survey with laser and GPS measurements, as well as surface and in-depth benchmarks, (d) seismic network, consisting primarily of accelerometer stations. Separately a variety of different chemical analyses are conducted, in parallel with spigotting processes and relief wells monitorin. This leads to a large amount of data that is difficult to analyze with conventional methods. In this article, we discuss a machine learning-driven approach which should improve the quality of the monitoring and maintenance of such facilities. Overview of the main algorithms developed to determine the stability parameters or classification of tailings are presented. The concepts described in this article will be further developed in the IlluMINEation project (H2020).
PL
W składowisku odpadów poflotacyjnych KGHM Żelazny Most składuje się rocznie około 30 milionów ton odpadów przeróbczych. Zajmujący powierzchnię prawie 1,6 tys. ha i otoczony zaporami o łącznej długości 14 km i wysokości na niektórych obszarach ponad 70 m, czyni go największym zbiornikiem odpadów poflotacyjnych w Europie i drugim co do wielkości na świecie. Z około 2900 urządzeniami monitorującymi i punktami pomiarowymi otaczającymi obiekt, Żelazny Most jest przedmiotem całodobowego monitoringu, co ze względów bezpieczeństwa i ekonomicznych ma kluczowe znaczenie nie tylko dla najbliższego otoczenia obiektu, ale dla całego regionu. Sieć monitoringu można podzielić na cztery główne grupy: (a) geotechniczna, składająca się głównie z inklinometrów i przetworników ciśnienia porowego VW, (b) hydrologiczna z piezometrami i miernikami poziomu wody, (c) geodezyjne z pomiarami laserowymi i GPS oraz jako repery powierzchniowe i gruntowe, (d) sieć sejsmiczna, składająca się głównie ze stacji akcelerometrów. Oddzielnie przeprowadza się szereg różnych analiz chemicznych, równolegle z procesami spigotingu i monitorowaniem studni odciążających. Prowadzi to do dużej ilości danych, które są trudne do analizy konwencjonalnymi metodami. W tym artykule omawiamy podejście oparte na uczeniu maszynowym, które powinno poprawić jakość monitorowania i utrzymania takich obiektów. Przedstawiono przegląd głównych algorytmów opracowanych do wyznaczania parametrów stateczności lub klasyfikacji odpadów. Do analizy i klasyfikacji odpadów wykorzystano pomiary z testów CPTU. Klasyfikacja gruntów naturalnych z wykorzystaniem badan CPT jest powszechnie stosowana, nowością jest zastosowanie podobnej metody do klasyfikacji odpadów na przykładzie zbiornika poflotacyjnego. Analiza eksploracyjna pozwoliła na wskazanie najistotniejszych parametrów dla modelu. Do klasyfikacji wykorzystano wybrane modele uczenia maszynowego: k najbliższych sąsiadów, SVM, RBF SVM, drzewo decyzyjne, las losowy, sieci neuronowe, QDA, które porównano w celu wytypowania najskuteczniejszego. Koncepcje opisane w tym artykule będą dalej rozwijane w projekcie IlluMINEation (H2020).
Rozpatrywany jest problem wyznaczania rekomendacji na podstawie wskazanych przykładów decyzji akceptowalnych i przykładów decyzji nieakceptowalnych. Wskazanie przez decydenta tych przykładów jest podstawą oceny jego preferencji. Istota przedstawionego rozwiązania polega na określeniu preferencji jako klastra wyznaczonego poprzez uzupełnianie wskazanych przykładów. W artykule zaproponowano procedurę kolejnych przybliżeń bazującą na rozwiązaniach zadania klasyfikacji na podstawie zadanych przykładów.
EN
The problem of determining a decision recommendation according to examples of acceptable decisions and examples of unacceptable decisions indicated by the decision-maker is considered in the paper. The decision-maker's examples are the foundation for assessing his preferences. The essence of the presented solution consists in determining the preferences of the decision-maker as a cluster designated by supplementing the indicated examples. The paper proposes a procedure of successive approximations based on the classification task according to given examples.
W artykule przedstawiono zakłócenia spotykane w pracy przekrawacza rotacyjnego. Skupiono się na tematyce uszkodzeń podzespołów elektrycznych, jak np. enkoder. Zaprezentowano również zagadnienia teoretyczne dotyczące systemów diagnostycznych, opartych na systemach sztucznej inteligencji – sieci neuronowe. Omówiono prostą metodę diagnostyczną, wykorzystującą statystykę w aplikacji tekturnicy.
EN
The article presents the disturbances encountered in the operation of a rotary sheeter, and focuses on damage to electrical components, such as an encoder. Theoretical issues of diagnostic systems based on artificial intelligence systems – neural networks are also presented. A simple diagnostic method was presented, based on statistics in the corrugator application.
Purpose: Diabetes is a chronic disease that pays for a large proportion of the nation's healthcare expenses when people with diabetes want medical care continuously. Several complications will occur if the polymer disorder is not treated and unrecognizable. The prescribed condition leads to a diagnostic center and a doctor's intention. One of the real-world subjects essential is to find the first phase of the polytechnic. In this work, basically a survey that has been analyzed in several parameters within the poly-infected disorder diagnosis. It resembles the classification algorithms of data collection that plays an important role in the data collection method. Automation of polygenic disorder analysis, as well as another machine learning algorithm. Design/methodology/approach: This paper provides extensive surveys of different analogies which have been used for the analysis of medical data, For the purpose of early detection of polygenic disorder. This paper takes into consideration methods such as J48, CART, SVMs and KNN square, this paper also conducts a formal surveying of all the studies, and provides a conclusion at the end. Findings: This surveying has been analyzed on several parameters within the poly-infected disorder diagnosis. It resembles that the classification algorithms of data collection plays an important role in the data collection method in Automation of polygenic disorder analysis, as well as another machine learning algorithm. Practical implications: This paper will help future researchers in the field of Healthcare, specifically in the domain of diabetes, to understand differences between classification algorithms. Originality/value: This paper will help in comparing machine learning algorithms by going through results and selecting the appropriate approach based on requirements.
Big data, artificial intelligence and the Internet of things (IoT) are still very popular areas in current research and industrial applications. Processing massive amounts of data generated by the IoT and stored in distributed space is not a straightforward task and may cause many problems. During the last few decades, scientists have proposed many interesting approaches to extract information and discover knowledge from data collected in database systems or other sources. We observe a permanent development of machine learning algorithms that support each phase of the data mining process, ensuring achievement of better results than before. Rough set theory (RST) delivers a formal insight into information, knowledge, data reduction, uncertainty, and missing values. This formalism, formulated in the 1980s and developed by several researches, can serve as a theoretical basis and practical background for dealing with ambiguities, data reduction, building ontologies, etc. Moreover, as a mature theory, it has evolved into numerous extensions and has been transformed through various incarnations, which have enriched expressiveness and applicability of the related tools. The main aim of this article is to present an overview of selected applications of RST in big data analysis and processing. Thousands of publications on rough sets have been contributed; therefore, we focus on papers published in the last few years. The applications of RST are considered from two main perspectives: direct use of the RST concepts and tools, and jointly with other approaches, i.e., fuzzy sets, probabilistic concepts, and deep learning. The latter hybrid idea seems to be very promising for developing new methods and related tools as well as extensions of the application area.
12
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Free-choice nets, a subclass of Petri nets, have been studied for decades. They are interesting because they have many desirable properties normal Petri nets do not have and can be analyzed efficiently. Although the majority of process models used in practice are inherently free-choice, most users (even modeling experts) are not aware of free-choice net theory and associated analysis techniques. This paper discusses free-choice nets in the context of process mining and business process management. For example, state-of-the-art process discovery algorithms like the inductive miner produce process models that are free-choice. Also, hand-made process models using languages like BPMN tend to be free-choice because choice and synchronization are separated in different modeling elements. Therefore, we introduce basic notions and results for this important class of process models. Moreover, we also present new results for free-choice nets particularly relevant for process mining. For example, we elaborate on home clusters and lucency as closely-related and desirable correctness notions. We also discuss the limitations of free-choice nets in process mining and business process management, and suggest research directions to extend free-choice nets with non-local dependencies.
13
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
In the history of the world economy, the bankruptcy of some large companies has caused global financial crises. The study aimed to postulate a model of bankruptcy prediction for listed companies on Vietnam's stock market. The research used six popular algorithms in data mining to predict bankruptcy risk with data collected from 4693 observations in the period 2009-2020. The research results showed that Logistic algorithms, Artificial Neural Network, Decision Tree have a high level of predicting bankruptcy with an accuracy of 98%. The study identified the three most important indicators: inventory turnover ratio, debt to equity ratio, and debt ratio that affect the corporate bankruptcy prediction. The study showed the threshold points of 10-indicators to avoid bankruptcy likelihood. These results recommended that the model could be applied in practice to reduce risks for businesses and investors in the Vietnamese market.
14
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Knowledge graphs have been shown to play an important role in recent knowledge mining and discovery, for example in the field of life sciences or bioinformatics. Contextual information is widely used for NLP and knowledge discovery in life sciences since it highly influences the exact meaning of natural language and also queries for data. The contributions of this paper are (1) an efficient approach towards interoperable data, (2) a runtime analysis of 14 real world use cases represented by graph queries and (3) a unique view on clinical data and its application combining methods of algorithmic optimisation, graph theory and data science.
15
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Potential seismic sources play an important role in seismic hazard analysis. Identification of seismic sources is generally carried out on the basis of expert judgments, and in most cases, different and controversial results are obtained when several experts are consulted. In fact, the method of source identification is probably an important cause of uncertainty in the seismic hazard analysis. The main objective of this research is to provide an algorithm which combines the weighted K-means clustering analysis and Particle Swarm Optimization in order to automatically identify global optimum clusters by analysing seismic event data. These clusters, together with seismotectonic information, can be used to determine seismic sources. Two validity indexes, Davies–Bouldin's measure and Chou–Su–Lai's measure (CS), are used to determine optimum number of clusters. Study area is located at the longitude of 46°–48° E and latitude of 34°–36° N that is considered as the most seismically active part of Zagros continental collision zone, which has experienced large and destructive earthquakes due to movements of Sahneh and Nahavand segments of Zagros Main Recent Fault. As a result, 7-cluster model which is identified on the basis of DB validity index seems to be suitable for the considered earthquake catalogue, despite some limitations in partitioning.
Purpose: The main purpose of the article was to present the results of the analysis of the after-sales service process using data mining on the example of data gathered in an authorized car service station. As a result of the completed literature review and identification of cognitive gaps, two research questions were formulated (RQ). RQ1: Does the after-sales service meet the parameters of business process category? RQ2: Is the after-sales service characterized by trends or is it seasonal in nature? Design/methodology/approach: The following research methods were used in the study: quantitative bibliographic analysis, systematic literature review, participant observation and statistical methods. Theoretical and empirical study used R programming language and Gretl software. Findings: Basing on relational database designed for the purpose of carrying out the research procedure, the presented results were of: the analysis of the service sales structure, sales dynamics, as well as trend and seasonality analyses. As a result of research procedure, the effects of after-sales service process were presented in terms of quantity and value (amount). In addition, it has been shown that after-sales service should be identified in the business process category. Originality/value: The article uses data mining and R programming language to analyze the effects generated in after-sales service on the example of a complete sample of 13,418 completed repairs carried out in 2013-2018. On the basis of empirical proceedings carried out, the structure of a customer-supplier relationship was recreated in external and internal terms on the example of examined organization. In addition, the possibilities of using data generated from the domain system were characterized and further research directions, as well as application recommendations in the area of after-sales services was presented.
17
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Obecnie prawie każdy proces, w tym produkcyjny, generuje ogromne ilości danych. Gromadzimy je, czy jednak potrafimy odpowiednio wykorzystać? Jest to szczególnie ważne w kontekście Przemysłu 4.0, w którym dane są najważniejszym „surowcem”, a efektywne ich wykorzystanie jest kluczowe, głównie za sprawą wiedzy, którą można z nich pozyskać.
W artykule dyskutowane są możliwości zastosowania metod syntezy logicznej w zadaniach eksploracji danych. W szczególności omawiana jest metoda redukcji atrybutów oraz metoda indukcji reguł decyzyjnych. Pokazano, że metody syntezy logicznej skutecznie usprawniają te procedury i z powodzeniem mogą być zastosowane do rozwiązywania ogólniejszych zadań eksploracji danych. W uzasadnieniu celowości takiego postępowania omówiono diagnozowanie pacjentów z możliwością eliminowania kłopotliwych badań.
EN
The article discusses the possibilities of application of logic synthesis methods in data mining tasks. In particular, the method of reducing attributes and the method of inducing decision rules is considered. It is shown that by applying specialized logic synthesis methods, these issues can be effectively improved and successfully used for solving data mining tasks. In justification of the advisability of such proceedings, the patient's diagnosis with the possibility of eliminating troublesome tests is discussed.
19
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Nuclear power plant process systems have developed great lyover the years. As a large amount of data is generated from Distributed Control Systems (DCS) with fast computational speed and large storage facilities, smart systems have taken over analysis of the process. These systems are built using data mining concepts to understand the various stable operating regimes of the processes, identify key performance factors, makes estimates and suggest operators to optimize the process. Association rule mining is a frequently used data-mining conceptin e-commerce for suggesting closely related and frequently bought products to customers. It also has a very wide application in industries such as bioinformatics, nuclear sciences, trading and marketing. This paper deals with application of these techniques for identification and estimation of key performance variables of a lubrication system designed for a 2.7 MW centrifugal pump used for reactor cooling in a typical 500MWe nuclear power plant. This paper dwells in detail on predictive model building using three models based on association rules for steady state estimation of key performance indicators (KPIs) of the process. The paper also dwells on evaluation of prediction models with various metrics and selection of best model.
In this paper we tackle the problem of vehicle re-identification in a camera network utilizing triplet embeddings. Re-identification is the problem of matching appearances of objects across different cameras. With the proliferation of surveillance cameras enabling smart and safer cities, there is an ever-increasing need to re-identify vehicles across cameras. Typical challenges arising in smart city scenarios include variations of viewpoints, illumination and self occlusions. Most successful approaches for re-identification involve (deep) learning an embedding space such that the vehicles of same identities are projected closer to one another, compared to the vehicles representing different identities. Popular loss functions for learning an embedding (space) include contrastive or triplet loss. In this paper we provide an extensive evaluation of triplet loss applied to vehicle re-identification and demonstrate that using the recently proposed sampling approaches for mining informative data points outperform most of the existing state-of-the-art approaches for vehicle re-identification. Compared to most existing state-of-the-art approaches, our approach is simpler and more straightforward for training utilizing only identity-level annotations, along with one of the smallest published embedding dimensions for efficient inference. Furthermore in this work we introduce a formal evaluation of a triplet sampling variant (batch sample) into the re-identification literature. In addition to the conference version [24], this submission adds extensive experiments on new released datasets, cross domain evaluations and ablation studies.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.