Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 9

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  MapReduce
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
EN
Increasing development in information and communication technology leads to the generation of large amount of data from various sources. These collected data from multiple sources grows exponentially and may not be structurally uniform. In general, these are heterogeneous and distributed in multiple databases. Because of large volume, high velocity and variety of data mining knowledge in this environment becomes a big data challenge. Distributed Association Rule Mining(DARM) in these circumstances becomes a tedious task for an effective global Decision Support System(DSS). The DARM algorithms generate a large number of association rules and frequent itemset in the big data environment. In this situation synthesizing highfrequency rules from the big database becomes more challenging. Many algorithms for synthesizing association rule have been proposed in multiple database mining environments. These are facing enormous challenges in terms of high availability, scalability, efficiency, high cost for the storage and processing of large intermediate results and multiple redundant rules. In this paper, we have proposed a model to collect data from multiple sources into a big data storage framework based on HDFS. Secondly, a weighted multi-partitioned method for synthesizing high-frequency rules using MapReduce programming paradigm has been proposed. Experiments have been conducted in a parallel and distributed environment by using commodity hardware. We ensure the efficiency, scalability, high availability and costeffectiveness of our proposed method.
2
Content available Big problems with Big Data
EN
The article presents an overview of the most important issues related to the phenomenon called big data. The characteristics of big data concerning the data itself and the data sources are presented. Then, the big data life cycle concept is formulated. The next sections focus on two big data technologies: MapReduce for big data processing and NoSQL databases for big data storage.
3
Content available A survey of big data classification strategies
EN
Big data plays nowadays a major role in finance, industry, medicine, and various other fields. In this survey, 50 research papers are reviewed regarding different big data classification techniques presented and/or used in the respective studies. The classification techniques are categorized into machine learning, evolutionary intelligence, fuzzy-based approaches, deep learning and so on. The research gaps and the challenges of the big data classification, faced by the existing techniques are also listed and described, which should help the researchers in enhancing the effectiveness of their future works. The research papers are analyzed for different techniques with respect to software tools, datasets used, publication year, classification techniques, and the performance metrics. It can be concluded from the here presented survey that the most frequently used big data classification methods are based on the machine learning techniques and the apparently most commonly used dataset for big data classification is the UCI repository dataset. The most frequently used performance metrics are accuracy and execution time.
EN
Social media is playing an increasingly important role in reporting major events happening in the world. However, detecting events from social media is challenging due to the huge magnitude of the data and the complex semantics of the language being processed. This paper proposes MASEED (MapReduce and Semantics Enabled Event Detection), a novel event detection framework that effectively addresses the following problems: 1) traditional data mining paradigms cannot work for big data; 2) data preprocessing requires significant human efforts; 3) domain knowledge must be gained before the detection; 4) semantic interpretation of events is overlooked; 5) detection scenarios are limited to specific domains. In this work, we overcome these challenges by embedding semantic analysis into temporal analysis for capturing the salient aspects of social media data, and parallelizing the detection of potential events using the MapReduce methodology. We evaluate the performance of our method using real Twitter data. The results will demonstrate the proposed system outperforms most of the state-of-the-art methods in terms of accuracy and efficiency.
5
Content available remote Pre-Processing and Modeling Tools for Big Data
EN
Modeling tools and operators help the user / developer to identify the processing field on the top of the sequence and to send into the computing module only the data related to the requested result. The remaining data is not relevant and it will slow down the processing. The biggest challenge nowadays is to get high quality processing results with a reduced computing time and costs. To do so, we must review the processing sequence, by adding several modeling tools. The existing processing models do not take in consideration this aspect and focus on getting high calculation performances which will increase the computing time and costs. In this paper we provide a study of the main modeling tools for Big Data and a new model based on pre-processing.
PL
Big Data jest jednym z najważniejszych wyzwań współczesnej informatyki. Wobec zmasowanego napływu wielkich ilości informacji obecnych czasach pochodzących z różnych źródeł, konieczne jest wprowadzanie nowych technik analizy danych oraz rozwiązań technologicznych. Ważnym narzędziem w Big Data jest oprogramowanie Hadoop.
EN
Big Data is a term frequently used in the literature, but still there is no consensus in implementations of such environments. An important tool in Big data is a software Hadoop. The are many tools and technologies in this area. This paper is the review in the Big Data technologies.
7
Content available remote High-resolution scatter analyse using cloud computing
EN
Cloud computing is the newest approach to solve computationally challenging problems. It is oriented on optimization of processing costs using low-budget, standard computers. Algorithmic scheme for such problems is MapReduce. We will show how to use MapReduce architecture to efficiently solve high number of independent analysis needed for scatter plots. Presented case study is based on simple student problem solved using FEM. High-resolution scatter plot image introduce new quality in visualization of results.
PL
Chmura obliczeniowa (ang. cloud computing) to najnowsze podejście do rozwiązywania problemów złożonych obliczeniowo. Jest to architektura zorientowana na optymalizację kosztów przetwarzania przy użyciu niskobudżetowych, standardowych komputerów. Algorytmem obliczeniowym dla takich problemów jest MapReduce. W niniejszym artykule pokażemy jak wykorzystać MapReduce do efektywnego rozwiązywania dużej liczby niezależnych analiz, które zostaną zobrazowane przy pomocy wykresu zmienności. Zaprezentowany przykład jest prostym studenckim problemem MES. Wysokiej rozdzielczości analiza wprowadzaj nową jakość w wizualizacji wyników.
PL
Artykuł ma na celu przedstawienie krytycznych punktów w MapReduce prowadzących do utraty lub zysku w wydajności. Jest to wstęp do dokładniejszej analizy poszczególnych faz wykonania występujących w większości implementacji z wykorzystaniem tegoż paradygmatu. Poruszane są ogólne zagadnienia związane z danymi, mapowaniem, redukcją, partycjonowaniem oraz kilka innych.
EN
Purpose of the paper is to create simple overview of critical points for performance in MapReduce. This is introduction to more detailed analysis of execution phases being part of the most issues implemented using said paradigm. Discussed are general issues related to data, mapping, reduction, partitioning and several others.
9
EN
This work analyses the performance of Hadoop, an implementation of the MapReduce programming model for distributed parallel computing, executing on a virtualisation environment comprised of 1+16 nodes running the VMWare workstation software. A set of experiments using the standard Hadoop benchmarks has been designed in order to determine whether or not significant reductions in the execution time of computations are experienced when using Hadoop on this virtualisation platform on a departmental cloud. Our findings indicate that a significant decrease in computing times is observed under these conditions. They also highlight how overheads and virtualisation in a distributed environment hinder the possibility of achieving the maximum (peak) performance.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.