Ograniczanie wyników
Czasopisma help
Autorzy help
Lata help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 440

Liczba wyników na stronie
first rewind previous Strona / 22 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  data mining
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 22 next fast forward last
EN
Feature Selection (FS) is an essential research topic in the area of machine learning. FS, which is the process of identifying the relevant features and removing the irrelevant and redundant ones, is meant to deal with the high dimensionality problem for the sake of selecting the best performing feature subset. In the literature, many feature selection techniques approach the task as a research problem, where each state in the search space is a possible feature subset. In this paper, we introduce a new feature selection method based on reinforcement learning. First, decision tree branches are used to traverse the search space. Second, a transition similarity measure is proposed so as to ensure exploit-explore trade-off. Finally, the informative features are the most involved ones in constructing the best branches. The performance of the proposed approaches is evaluated on nine standard benchmark datasets. The results using the AUC score show the effectiveness of the proposed system.
2
Content available A study of big data in cloud computing
EN
Over the last two decades, the size and amount of data has increased enormously, whichhas changed traditional methods of data management and introduced two new technolog-ical terms: big data and cloud computing. Addressing big data, characterized by massivevolume, high velocity and variety, is quite challenging as it requires large computationalinfrastructure to store, process and analyze it. A reliable technique to carry out sophisti-cated and enormous data processing has emerged in the form of cloud computing becauseit eliminates the need to manage advanced hardware and software, and offers various ser-vices to users. Presently, big data and cloud computing are gaining significant interestamong academia as well as in industrial research. In this review, we introduce variouscharacteristics, applications and challenges of big data and cloud computing. We providea brief overview of different platforms that are available to handle big data, including theircritical analysis based on different parameters. We also discuss the correlation betweenbig data and cloud computing. We focus on the life cycle of big data and its vital analysisapplications in various fields and domains At the end, we present the open research issuesthat still need to be addressed and give some pointers to future scholars in the fields ofbig data and cloud computing.
EN
Distresses are integral parts of pavement that occur during the life of the road. Bitumen distress is known as one of the most important problems of Iran's roads, especially in tropical areas and transit routes with heavy axes; so, identifying the effective factors in creating the bleeding phenomenon is very necessary and important. Therefore, this study was conducted to investigate the parameters of the mixing design in creation of bleeding phenomenon and its severity. The collected data were then analyzed and grouped using Design Expert and SPSS software. The results show that all five parameters of optimal bitumen percent, bitumen percent in asphalt mixture, void percent of Marshall Sample, percent void and filler to bitumen ratio are effective on bitumen and its intensity. Among the mentioned parameters, two parameters of percent of bitumen compared to asphalt mixture and the void percent in the Marshall sample have a greater effect on the severity of the bleeding phenomenon.
EN
An electricity theft is a problem for distribution system operators (DSOs) in Poland. DSOs use many ways to limit this unfavourable phenomenon. Within this paper, the author presents a new method to detect the location of illegal power consumption. The method bases on the processing of data from an advanced metering infrastructure (AMI). The method is based on the observation that some consumers illegally consume energy mainly in the winter season and that the level of illegal energy consumption may depend on the level of energy consumption. The method searches for periods of temporary reduction of the balance difference and a simultaneous decrease in energy consumption by one of the consumers.
PL
Kradzieże energii elektrycznej są problemem dla operatorów systemów dystrybucyjnych (OSD) W Polsce. OSD stosują wiele sposobów ograniczania tego niekorzystnego zjawiska. W artykule autor przedstawia nową metodę wykrywania miejsc nielegalnego poboru energii. Metoda opiera się na przetwarzaniu danych z zaawansowanej infiastruktury pomiarowej (AMI). Metoda bazuje na spostrzeżeniu, że część odbiorców nielegalnie pobiera energię głównie W sezonie zimowym oraz że poziom nielegalnego poboru energii może zależeć od poziomu zużycia energii. Metoda poszukuje okresów czasowego zmniejszenia różnicy bilansowej i występującego W tym samym czasie spadku zużycia energii przez jednego z odbiorców.
EN
Production problems have a significant impact on the on-time delivery of orders, resulting in deviations from planned scenarios. Therefore, it is crucial to predict interruptions during scheduling and to find optimal production sequencing solutions. This paper introduces a selflearning framework that integrates association rules and optimisation techniques to develop a scheduling algorithm capable of learning from past production experiences and anticipating future problems. Association rules identify factors that hinder the production process, while optimisation techniques use mathematical models to optimise the sequence of tasks and minimise execution time. In addition, association rules establish correlations between production parameters and success rates, allowing corrective factors for production quantity to be calculated based on confidence values and success rates. The proposed solution demonstrates robustness and flexibility, providing efficient solutions for Flow-Shop and Job-Shop scheduling problems with reduced calculation times. The article includes two Flow-Shop and Job-Shop examples where the framework is applied.
EN
This study used stick model augmentation on single-camera motion video to create a markerless motion classification model of manual operations. All videos were augmented with a stick model composed of keypoints and lines by using the programming model, which later incorporated the COCO dataset, OpenCV and OpenPose modules to estimate the coordinates and body joints. The stick model data included the initial velocity, cumulative velocity, and acceleration for each body joint. The extracted motion vector data were normalized using three different techniques, and the resulting datasets were subjected to eight classifiers. The experiment involved four distinct motion sequences performed by eight participants. The random forest classifier performed the best in terms of accuracy in recorded data classification in its min-max normalized dataset. This classifier also obtained a score of 81.80% for the dataset before random subsampling and a score of 92.37% for the resampled dataset. Meanwhile, the random subsampling method dramatically improved classification accuracy by removing noise data and replacing them with replicated instances to balance the class. This research advances methodological and applied knowledge on the capture and classification of human motion using a single camera view.
EN
Multiple linear regression and artificial neural network (ANN) models were utilized in this study to assess the type influence of nanomaterials on polluted water disinfection. This was accomplished by estimating E. coli (E.C) and the total coliform (TC) concentrations in contaminated water while nanoparticles were added at various concentrations as input variables, together with water temperature, PH, and turbidity. To achieve this objective, two approaches were implemented: data mining with two types of artificial neural networks (MLP and RBF), and multiple linear regression models (MLR). The simulation was conducted using SPSS software. Data mining was revealed after the estimated findings were checked against the measured data. It was found that MLP was the most promising model in the prediction of the TC and E.C concentration, s followed by the RBF and MLR models, respectively.
EN
The wastewater treatment landscape in Central Europe, particularly in Poland, has undergone a profound transformation due to European Union (EU) integration. Fueled by EU funding and rapid technological advancements, wastewater treatment plants (WWTPs) have adopted cutting-edge control methods to adhere to EU Water Framework Directive mandates. WWTPs contend with complexities such as variable flow rates, temperature fluctuations, and evolving influent compositions, necessitating advanced control systems and precise sensors to ensure water quality, enhance energy efficiency, and reduce operational costs. Wastewater mathematical modeling provides operational flexibility, acting as a virtual testing ground for process enhancements and resource optimization. Real-time sensors play a crucial role in creating these models by continuously monitoring key parameters and supplying data to predictive models. These models empower real-time decision-making, resulting in minimized downtime and reduced expenses, thus promoting the sustainability and efficiency of WWTPs while aligning with resource recovery and environmental stewardship goals. The evolution of WWTPs in Central Europe is driven by a range of factors. To optimize WWTPs, a multi-criteria approach is presented, integrating simulation models with data mining methods, while taking into account parameter interactions. This approach strikes a balance between the volume of data collected and the complexity of statistical analysis, employing machine learning techniques to cut costs for process optimization. The future of WWTP control systems lies in “smart process control systems”, which revolve around simulation models driven by real-time data, ultimately leading to optimal biochemical processes. In conclusion, Central Europe’s wastewater treatment sector has wholeheartedly embraced advanced control methods and mathematical modeling to comply with EU regulations and advance sustainability objectives. Real-time monitoring and sophisticated modeling are instrumental in driving efficient, resource-conscious operations. Challenges remain in terms of data accessibility and cost-effective online monitoring, especially for smaller WWTPs.
EN
This article presents a model based on machine learning for the selection of the characteristics that most influence the low industrial yield of cane sugar production in Cuba. The set of data used in this work corresponds to a period of ten years of sugar harvests from 2010 to 2019. A pro‐ cess of understanding the business and of understand‐ ing and preparing the data is carried out. The accuracy of six rule learning algorithms is evaluated: CONJUNC‐ TIVERULE, DECISIONTABLE, RIDOR, FURIA, PART and JRIP. The results obtained allow us to identify: R417, R379, R378, R419a, R410, R613, R1427 and R380, as the indi‐ cators that most influence low industrial performance.
EN
The paper describes one of the methods of data acquisition in data mining models used to support decision-making. The study presents the possibilities of data collection using the phases of the CRISP-DM model for an organization and presents the possibility of adapting the model for analysis and management in the decision-making process. The first three phases of implementing the CRISP-DM model are described using data from an enterprise with small batch production as an example. The paper presents the CRISP-DM based model for data mining in the process of predicting assembly cycle time. The developed solution has been evaluated using real industrial data and will be a part of methodology that allows to estimate the assembly time of a finished product at the quotation stage, i.e., without the detailed technology of the product being known.
EN
Purpose: The aim of the article is to describe and forecast possible difficulties related to the development of cognitive technologies and the progressing of algorithmization of HRM processes as a part of Industry 4.0. Design/methodology/approach: While most of the studies to date related to the phenomenon of Industry 4.0 and Big Data are concerned with the level of efficiency of cyber-physical systems and the improvement of algorithmic tools, this study proposes a different perspective. It is an attempt to foresee the possible difficulties connected with algorithmization HRM processes, which understanding could help to "prepare" or even eliminate the harmful effects we may face which will affect decisions made in the field of the managing organizations, especially regarding human resources management, in era of Industry 4.0. Findings: The research of cognitive technologies in the broadest sense is primarily associated with a focus of thinking on their effectiveness, which can result in a one-sided view and ultimately a lack of objective assessment of that effectiveness. Therefore, conducting a parallel critical reflection seems even necessary. This reflection has the potential to lead to a more balanced assessment of what is undoubtedly "for", but also of what may be "against". The proposed point of view may contribute to a more informed use of algorithm-based cognitive technologies in the human resource management process, and thus to improve their real-world effectiveness. Social implications: The article can have an educational function, helps to develop critical thinking about cognitive technologies, and directs attention to areas of knowledge by which future skills should be extended. Originality/value: This article is addressed to all those who use algorithms and data-driven decision-making processes in HRM. Crucial in these considerations is the to draw attention to the dangers of unreflective use of technical solutions supporting HRM processes. The novelty of the proposed approach is the identification of three potential risk areas that may result in faulty HR decisions. These include the risk of "technological proof of equity", overconfidence in the objective character of algorithms and the existence of a real danger resulting from the so-called algorithm overfitting. Recognition of these difficulties ultimately contributed to real improvements in productivity by combining human performance with technology effectiveness.
EN
The aluminum profile extrusion process is briefly characterized in the paper, together with the presentation of historical, automatically recorded data. The initial selection of the important, widely understood, process parameters was made using statistical methods such as correlation analysis for continuous and categorical (discrete) variables and ‘inverse’ ANOVA and Kruskal–Wallis methods. These selected process variables were used as inputs for MLP-type neural models with two main product defects as the numerical outputs with values 0 and 1. A multi-variant development program was applied for the neural networks and the best neural models were utilized for finding the characteristic influence of the process parameters on the product quality. The final result of the research is the basis of a recommendation system for the significant process parameters that uses a combination of information from previous cases and neural models.
EN
Objectives: To make a clear literature review on state-ofthe-art heart disease prediction models. Methods: It reviews 61 research papers and states the significant analysis. Initially, the analysis addresses the contributions of each literature works and observes the simulation environment. Here, different types of machine learning algorithms deployed in each contribution. In addition, the utilized dataset for existing heart disease prediction models was observed. Results: The performance measures computed in entire papers like prediction accuracy, prediction error, specificity, sensitivity, f-measure, etc., are learned. Further, the best performance is also checked to confirm the effectiveness of entire contributions. Conclusions: The comprehensive research challenges and the gap are portrayed based on the development of intelligent methods concerning the unresolved challenges in heart disease prediction using data mining techniques.
EN
Approximately 30 million tons of tailings are being stored each year at the KGHMs Zelazny Most Tailings Storage Facility (TSF). Covering an area of almost 1.6 thousand hectares, and being surrounded by dams of a total length of 14 km and height of over 70 m in some areas, makes it the largest reservoir of post-flotation tailings in Europe and the second-largest in the world. With approximately 2900 monitoring instruments and measuring points surrounding the facility, Zelazny Most is a subject of round-the-clock monitoring, which for safety and economic reasons is crucial not only for the immediate surroundings of the facility but for the entire region. The monitoring network can be divided into four main groups: (a) geotechnical, consisting mostly of inclinometers and VW pore pressure transducers, (b) hydrological with piezometers and water level gauges, (c) geodetic survey with laser and GPS measurements, as well as surface and in-depth benchmarks, (d) seismic network, consisting primarily of accelerometer stations. Separately a variety of different chemical analyses are conducted, in parallel with spigotting processes and relief wells monitorin. This leads to a large amount of data that is difficult to analyze with conventional methods. In this article, we discuss a machine learning-driven approach which should improve the quality of the monitoring and maintenance of such facilities. Overview of the main algorithms developed to determine the stability parameters or classification of tailings are presented. The concepts described in this article will be further developed in the IlluMINEation project (H2020).
PL
W składowisku odpadów poflotacyjnych KGHM Żelazny Most składuje się rocznie około 30 milionów ton odpadów przeróbczych. Zajmujący powierzchnię prawie 1,6 tys. ha i otoczony zaporami o łącznej długości 14 km i wysokości na niektórych obszarach ponad 70 m, czyni go największym zbiornikiem odpadów poflotacyjnych w Europie i drugim co do wielkości na świecie. Z około 2900 urządzeniami monitorującymi i punktami pomiarowymi otaczającymi obiekt, Żelazny Most jest przedmiotem całodobowego monitoringu, co ze względów bezpieczeństwa i ekonomicznych ma kluczowe znaczenie nie tylko dla najbliższego otoczenia obiektu, ale dla całego regionu. Sieć monitoringu można podzielić na cztery główne grupy: (a) geotechniczna, składająca się głównie z inklinometrów i przetworników ciśnienia porowego VW, (b) hydrologiczna z piezometrami i miernikami poziomu wody, (c) geodezyjne z pomiarami laserowymi i GPS oraz jako repery powierzchniowe i gruntowe, (d) sieć sejsmiczna, składająca się głównie ze stacji akcelerometrów. Oddzielnie przeprowadza się szereg różnych analiz chemicznych, równolegle z procesami spigotingu i monitorowaniem studni odciążających. Prowadzi to do dużej ilości danych, które są trudne do analizy konwencjonalnymi metodami. W tym artykule omawiamy podejście oparte na uczeniu maszynowym, które powinno poprawić jakość monitorowania i utrzymania takich obiektów. Przedstawiono przegląd głównych algorytmów opracowanych do wyznaczania parametrów stateczności lub klasyfikacji odpadów. Do analizy i klasyfikacji odpadów wykorzystano pomiary z testów CPTU. Klasyfikacja gruntów naturalnych z wykorzystaniem badan CPT jest powszechnie stosowana, nowością jest zastosowanie podobnej metody do klasyfikacji odpadów na przykładzie zbiornika poflotacyjnego. Analiza eksploracyjna pozwoliła na wskazanie najistotniejszych parametrów dla modelu. Do klasyfikacji wykorzystano wybrane modele uczenia maszynowego: k najbliższych sąsiadów, SVM, RBF SVM, drzewo decyzyjne, las losowy, sieci neuronowe, QDA, które porównano w celu wytypowania najskuteczniejszego. Koncepcje opisane w tym artykule będą dalej rozwijane w projekcie IlluMINEation (H2020).
15
Content available remote Network Intrusion Detection Using Machine Learning Techniques
EN
Intrusion detection systems (IDS) are essential for the protection of advanced communication networks. These systems were primarily designed to identify particular patterns, signatures, and rule violations. Machine Learning and Deep Learning approaches have been used in recent years in the field of network intrusion detection to provide promising alternatives. These approaches can discriminate between normal and anomalous patterns. In this paper, the NSL-KDD (Network Security Laboratory Knowledge Discovery and Data Mining) benchmark data set has been used to evaluate Network Intrusion Detection Systems (NIDS) by using different machine learning algorithms such as Support Vector Machine, J48, Random Forest, and Naïve Bytes with both binary and multi-class classification. The results of the application of those techniques are discussed in details and outperformed previous works.
EN
Chronic kidney disease is a general definition of kidney dysfunction that lasts more than 3 months. When chronic kidney disease is advanced, the kidneys are no longer able to cleanse the blood of toxins and harmful waste products and can no longer support the proper function of other organs. The disease can begin suddenly or develop latently over a long period of time without the presence of characteristic symptoms. The most common causes are other chronic diseases – diabetes and hypertension. Therefore, it is very important to diagnose the disease in early stages and opt for a suitable treatment - medication, diet and exercises to reduce its side effects. The purpose of this paper is to analyse and select those patient characteristics that may influence the prevalence of chronic kidney disease, as well as to extract classification rules and action rules that can be useful to medical professionals to efficiently and accurately diagnose patients with kidney chronic disease. The first step of the study was feature selection and evaluation of its effect on classification results. The study was repeated for four models – containing all available patient data, containing features identified by doctors as major factors in chronic kidney disease, and models containing features selected using Correlation Based Feature Selection and Chi-Square Test. Sequential Minimal Optimization and Multilayer Perceptron had the best performance for all four cases, with an average accuracy of 98.31% for SMO and 98.06% for Multilayer Perceptron, results that were confirmed by taking into consideration the F1-Score, for both algorithms was above 0.98. For all these models the classification rules are extracted. The final step was action rule extraction. The paper shows that appropriate data analysis allows for building models that can support doctors in diagnosing a disease and support their deci-sions on treatment. Action rules can be important guidelines for the doctors. They can reassure the doctor in his diagnosis or indicate new, previously unseen ways to cure the patient.
EN
Purpose: The aim of the article is to describe and forecast possible dilemmas related to the development of cognitive technologies and the progressing process of algorithmization of social life. Design/methodology/approach: Most of the current studies related to the Big Data phenomenon concern the level of efficiency improvement the algorithmic tools or protection against autonomization of machines, in this analysis a different perspective is proposed, namely - thoughtless way of using data-driven instruments, termed technological proof of equity. This study is to try to anticipate possible difficulties connected with algorithmization, which understanding could help to "prepare" or even eliminate the harmful effects we may face which will affect decisions made in the field of the social organization and managing organizations or cities etc. Findings: The proposed point of view may contribute to a more informed use of cognitive technologies, machine learning, artificial intelligence and an understanding of their impact on social life, especially unintended consequences. Social implications: The article can have an educational function, helps to develop critical thinking about cognitive technologies and directs attention to areas of knowledge by which future skills should be extended. Originality/value: The article is addressed to data scientist and all those who use algorithms and data-driven decision-making processes in their actions. Crucial in this considerations is the introduction the concept of technological proof of equity, which helps to "call" the real threat of the appearance of technologically grounded heuristic thinking and it’s social consequences.
PL
Rozpatrywany jest problem wyznaczania rekomendacji na podstawie wskazanych przykładów decyzji akceptowalnych i przykładów decyzji nieakceptowalnych. Wskazanie przez decydenta tych przykładów jest podstawą oceny jego preferencji. Istota przedstawionego rozwiązania polega na określeniu preferencji jako klastra wyznaczonego poprzez uzupełnianie wskazanych przykładów. W artykule zaproponowano procedurę kolejnych przybliżeń bazującą na rozwiązaniach zadania klasyfikacji na podstawie zadanych przykładów.
EN
The problem of determining a decision recommendation according to examples of acceptable decisions and examples of unacceptable decisions indicated by the decision-maker is considered in the paper. The decision-maker's examples are the foundation for assessing his preferences. The essence of the presented solution consists in determining the preferences of the decision-maker as a cluster designated by supplementing the indicated examples. The paper proposes a procedure of successive approximations based on the classification task according to given examples.
PL
W artykule przedstawiono zakłócenia spotykane w pracy przekrawacza rotacyjnego. Skupiono się na tematyce uszkodzeń podzespołów elektrycznych, jak np. enkoder. Zaprezentowano również zagadnienia teoretyczne dotyczące systemów diagnostycznych, opartych na systemach sztucznej inteligencji – sieci neuronowe. Omówiono prostą metodę diagnostyczną, wykorzystującą statystykę w aplikacji tekturnicy.
EN
The article presents the disturbances encountered in the operation of a rotary sheeter, and focuses on damage to electrical components, such as an encoder. Theoretical issues of diagnostic systems based on artificial intelligence systems – neural networks are also presented. A simple diagnostic method was presented, based on statistics in the corrugator application.
20
Content available Big Data i Data Mining w polskim budownictwie
PL
W artykule podjęto dyskusję nad występowaniem w polskim sektorze budownictwa bardzo dużych zasobów danych, określanych jako Big Data. W innych sektorach, np. finansowym czy usług, obserwuje się dostępność dużych baz danych i ich wykorzystanie w celu poprawy jakości usług, lepszego dostosowania się do wymagań klienta czy poprawy konkurencyjności na rynku. Sektor budowlany, a przede wszystkim jego produkt na tle innych dziedzin gospodarki cechuje specyfika. Czy jej występowanie powoduje brak zasobów Big Data? Artykuł wskazuje na występowanie zasobów Big Data w polskim budownictwie, możliwości i sposoby ich wykorzystania.
EN
The article discusses the work in Polish for the construction of very large data resources, referred to as Big Data. In other sectors, e.g. financial or services, the availability of the database market and their use to improve the quality of services is observed, a test version for customer testing or improvement of market competitiveness. The construction sector, and above all its product, is specific compared to other sectors of the economy. Does its occurrence result in a lack of Big Data resources? An article on the occurrence of Big Data resources in Polish construction, possibilities and ways of using them.
first rewind previous Strona / 22 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.