Wyniki wyszukiwania - BazTech

1

Efektywne przetwarzanie i integracja dużych zbiorów danych w środowisku Hadoop

Drzymała Paweł, Welfle Henryk, Drzymała Agnieszka

Przegląd Elektrotechniczny

|

2019

|

R. 95, nr 1

29--32

PL

Rozwój nowych kanałów elektronicznej wymiany informacji przyczynia się do powstania coraz większej ilości danych. Dane te są często zróżnicowane, niejednorodne i składowane bez ściśle zdefiniowanej struktury. W ciągu ostatnich 2 lat przyrosło 90% danych, jakie zostały wygenerowane od początku istnienia ludzkości. W artykule zaprezentowano architekturę i możliwości środowiska Hadoop powstałego w celu efektywnego przetwarzania i integracji dużych zbiorów danych. Przedstawiono cechy tej platformy oraz jej skalowalność. Omówiono metodę działania systemu plików HDFS oraz odporności na błędy składowania tego systemu. Zaprezentowano ideę współpracy węzłów klastra Hadoop oraz wykonywania działań typu Map – Reduce.

EN

The development of new channels of electronic information exchange contributes to the emergence of more and more data. These data are often diverse, heterogeneous and stored without a strictly defined structure. Over the past two years, 90% of the data has been generated since the beginning of human civilization. The article presents the architecture and possibilities of the Hadoop environment for the effective processing and integration of large data sets. It also presents the features of this platform and its scalability as well as discussed the method of operation in the HDFS file system and the resistance to storage errors of this system. The scheme of cooperation of the Hadoop cluster nodes to perform MapReduce operation was presented.

2

Decision-making enhancement in a big data environment : application of the K-means algorithm to mixed data

Koren Oded, Hallin Carina Antonia, Perel Nir, Bendet Dror

Journal of Artificial Intelligence and Soft Computing Research

|

2019

|

Vol. 9, No. 4

293--302

EN

Big data research has become an important discipline in information systems research. However, the flood of data being generated on the Internet is increasingly unstructured and non-numeric in the form of images and texts. Thus, research indicates that there is an increasing need to develop more efficient algorithms for treating mixed data in big data for effective decision making. In this paper, we apply the classical K-means algorithm to both numeric and categorical attributes in big data platforms. We first present an algorithm that handles the problem of mixed data. We then use big data platforms to implement the algorithm, demonstrating its functionalities by applying the algorithm in a detailed case study. This provides us with a solid basis for performing more targeted profiling for decision making and research using big data. Consequently, the decision makers will be able to treat mixed data, numerical and categorical data, to explain and predict phenomena in the big data ecosystem. Our research includes a detailed end-to-end case study that presents an implementation of the suggested procedure. This demonstrates its capabilities and the advantages that allow it to improve the decision-making process by targeting organizations’ business requirements to a specific cluster[s]/profiles[s] based on the enhancement outcomes.

3

A Contemplating approach for Hive and Map reduce for efficient Big Data Implementation

Sasubilli G., Sekhar U. S., Sharma S., Sharma S.

Annals of Computer Science and Information Systems

|

2018

|

Vol. 14

131--135

EN

In the reference current scenario, data is incremented exponentially and speed of data accruing at the rate of petabytes. Big data defines the available amount of data over the different media or wide communication media internet. Big Data term refers to the explosion in the quantity (and quality) of available and potentially relevant data. On the basis of quantity amount of data are very huge and this quantity has been handled by conventional database systems and data warehouses because the amount of data increases similarly complexity with it also increases. Multiple areas are involved in the production, generation, and implementation of Big Data such as news media, social networking sites, business applications, industrial community, and much more. Some parameters concern with the handling of Big Data like Efficient management, proper storage, availability, scalability, and processing. Thus to handle this big data, new techniques, tools, and architecture are required. In the present paper, we have discussed different technology available in the implementation and management of Big Data. This paper contemplates an approach formal tools and techniques used to solve the major difficulties with Big Data, This evaluate different industries data stock exchange to covariance factor and it tells the significance of data through covariance positive result using hive approach and also how much hive approach is efficient for that in the term of HDFS and hive query. and also evaluates the covariance factors after applying hive and map reduce approaches with stock exchange dataset of around 3500. After process data with the hive approach we have conclude that hive approach is better than map reduce and big table in terms of storage and processing of Big Data.

4

Big Data w inżynierii

Raczko R.

Mechanik

|

2016

|

R. 89, nr 7

806--807

PL

W pracy omówiono zagadnienia związane z możliwością i perspektywami wykorzystania technologii Big Data w inżynierii. Zdefiniowano pojęcie Big Data. Omówiono wybraną metodę przetwarzania danych w technologii Big Data. Przedstawiono możliwości wykorzystania Big Data w inżynierii.

EN

In the following paper issues related to the possibility and perspectives of using Big Data technology in engineering were presented. The concept of Big Data was defined. The chosen method of processing data in Big Data technologies was discusses. The possibility of using Big Data engineering was shown.

5

Massive simulations using MapReduce model

Krupa A., Sawicki B.

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2015

|

nr 4

45--47

EN

In the last few years cloud computing is growing as a dominant solution for large scale numerical problems. It is based on MapReduce programming model, which provides high scalability and flexibility, but also optimizes costs of computing infrastructure. This paper studies feasibility of MapReduce model for scientific problems consisting of many independent simulations. Experiment based on variability analysis for simple electromagnetic problem with over 10,000 scenarios proves that platform has nearly linear scalability with over 80% of theoretical maximum performance.

PL

W ostatnich latach chmury obliczeniowe stały się dominującym rozwiązaniem używanym do wielkoskalowych obliczeń numerycznych. Najczęściej są one oparte o programistyczny model MapReduce, który zapewnia wysoką skalowalność, elastyczność, oraz optymalizację kosztów infrastruktury. Artykuł w analityczny sposób przedstawia wykorzystanie MapReduce w rozwiązywaniu problemów naukowych złożonych z wielu niezależnych symulacji. Przeprowadzony eksperyment, złożony z ponad 10 000 przypadków, oparty o analizę zmienności pola elektromagnetycznego pokazuje niemal liniową skalowalność platformy i jej ponad 80% wydajności w stosunku do teoretycznego maksimum.