Wyniki wyszukiwania - Biblioteka Nauki

1

Apache Hadoop, platforma do gromadzenia, przetwarzania i analizy dużych zbiorów danych

100%

Gil M.

Journal of Computer Sciences Institute

|

2017

|

tom Vol. 4

70--75

PL

W artykule przedstawiono możliwości wykorzystania platformy Hadoop w zarządzaniu wielkimi zbiorami danych. Na podstawie dostępnych źródeł przedstawiono rozwój wydajności aplikacji. Dodatkowo zostały opisane organizacje, które dzięki wdrożeniu tego oprogramowania odniosły sukces w świecie Internetu.

EN

The article presents the possibilities of using Hadoop platform to manage large data sets. The development of application performance has been shown based on available sources. Additionally, the article describes the organizations that have been successful in the Internet thanks to the implemented software.

2

Towards Finding Scholarly Articles in Internet Using Hadoop MapReduce with Oozie Workflow

94%

Jurkiewicz J. , Nowiński A.

|

2013

|

tom Vol. 4, no. 4

3--6

EN

An article focuses on the new methods for automatic processing and analysis of the scientific papers. It covers the very first part of this task – discovery and harvesting of scientific publications from the internet. Article is focused on discovery and analysis of the html documents to identify publication resources. Usage of data from Common Crawl project allows operating on large subset of the web pages without a need to perform an expensive crawl of the WWW. We present methods for automatic identification of pages describing scholarly documents in WWW network using html meta headers. Presented set of rules applied to the data achieves reasonable quality. A system based on these tools is also presented. It allows easy operating and transferring output to the COntent ANalysis SYStem(CoAnSys) - a processing and analysis system developed in ICM. For achieving this goal set of MapReduce tasks running with Hadoop And Ozzie has been used. The quality and efficiency of described rules are discussed. Finally future challenges for our system are presented.

3

Implementacja cyfrowego filtru Savitzky'ego-Golaya w środowisku chmurowym

84%

Czerwiński D.

Zeszyty Naukowe. Elektryka / Politechnika Łódzka

|

2015

|

tom z. 126

35--49

PL

Artykuł przedstawia wyniki badań eksperymentalnych implementacji cyfrowego filtru wygładzającego Savitzky’ego-Golaya w środowisku chmurowym z wykorzystaniem języka programowania R. Dokonano porównania wyników badań dla implementacji filtru w środowisku chmurowym oraz w komercyjnym rozwiązaniu klasy Enterprise. Filtr zastosowano do danych pomiarowych pochodzących z układu zawierającego taśmę z nadprzewodników wysokotemperaturowych i generującego liczbę punktów pomiarowych przekraczającą możliwości komercyjnych środowisk eksploracji danych.

EN

The article presents the results of experimental implementation of a digital Savitzky-Golay smoothing filter in the cloud environment using a R programming language. Comparison of test results for the implementation of the filter in the cloud environment and in commercial enterprise-class data mining system was presented. The filter was applied to measurement data from system consisting of high-temperature superconductors tape and generating the number of measurement points which excess the possibilities of commercial data mining environments.

4

Big Data – znaczenie, zastosowania i rozwiązania technologiczne

84%

Racka K.

Zeszyty Naukowe PWSZ w Płocku. Nauki Ekonomiczne

|

2016

|

tom 1(23)

311 - 323

EN

Big Data technologies and their application to business processes is growing rapidly. Analytical and consulting enterprises specializing in issues of strategic use of IT technology indicate that the number of companies implementing or planning to implement technological solutions related to Big Data is increasing annually. A lot of companies believe that the analysis of unstructured data will be the key to a deeper understanding of customer behavior. They believe that the analyst is absolutely essential or very important to conduct the overall business strategy and improve operational results. The purpose of the article is to define Big Data, explain what the unstructured data are and how to apply them. Furthermore, in the article I present the results of reports on the Big Data technologies implementation and discuss the associated technologies.

PL

Technologie Big Data i ich zastosowanie do procesów biznesowych rozwijają się w tempie dynamicznym. Przedsiębiorstwa analityczno-doradcze specjalizujące się w zagadnieniach strategicznego wykorzystania technologii IT informują, że z roku na rok zwiększa się liczba przedsiębiorstw wdrażających lub planujących wdrożenie rozwiązań technologicznych związanych z Big Data. Dużo przedsiębiorstw uważa, że analizy danych niestrukturalnych będą kluczem do głębszego zrozumienia zachowań klienta. Uważają one, że analityka jest absolutnie niezbędna lub bardzo ważna dla prowadzenia ogólnej strategii biznesowej przedsiębiorstwa oraz do poprawy wyników operacyjnych. Celem tego artykułu jest wyjaśnienie co dokładnie oznacza pojęcie Big Data, co to są dane niestrukturalne oraz jakie mogą mieć zastosowania. Ponadto, w artykule prezentuję wyniki raportów dotyczących wdrażanie technologii Big Data i omawiam przykładowe technologie związane z Big Data.

5

Enhancing approach using hybrid pailler and RSA for information security in bigdata

84%

Abdalwahid S. M. J. , Yousif R. Z. , Kareem S. W.

Applied Computer Science

|

2019

|

tom Vol. 15, no 4

63--74

EN

The amount of data processed and stored in the cloud is growing dramatically. The traditional storage devices at both hardware and software levels cannot meet the requirement of the cloud. This fact motivates the need for a platform which can handle this problem. Hadoop is a deployed platform proposed to overcome this big data problem which often uses MapReduce architecture to process vast amounts of data of the cloud system. Hadoop has no strategy to assure the safety and confidentiality of the files saved inside the Hadoop distributed File system (HDFS). In the cloud, the protection of sensitive data is a critical issue in which data encryption schemes plays avital rule. This research proposes a hybrid system between two well-known asymmetric key cryptosystems (RSA, and Paillier) to encrypt the files stored in HDFS. Thus before saving data in HDFS, the proposed cryptosystem is utilized for encrypting the data. Each user of the cloud might upload files in two ways, non-safe or secure. The hybrid system shows higher computational complexity and less latency in comparison to the RSA cryptosystem alone.

6

Hybrid encryption algorithm for big data security in the Hadoop distributed file system

84%

Mohanraj T. , Santhosh R.

Computer Assisted Methods in Engineering and Science

|

2022

|

tom Vol. 29, no. 1-2 spec.

33--48

EN

A large amount of structured and unstructured data is collectively termed big data. Therecent technological development streamlined several companies to handle massive dataand interpret future trends and requirements. The Hadoop distributed file system (HDFS)is an application introduced for efficient big data processing. However, HDFS does not have built-in data encryption methodologies, which leads to serious security threats. Encryption algorithms are introduced to enhance data security; however, conventional algorithmslag in performance while handling larger files. This research aims to secure big data usinga novel hybrid encryption algorithm combining cipher-text policy attribute-based encryption (CP-ABE) and advanced encryption standard (AES) algorithms. The performanceof the proposed model is compared with traditional encryption algorithms such as DES, 3DES, and Blowfish to validate superior performance in terms of throughput, encryptiontime, decryption time, and efficiency. Maximum efficiency of 96.5% with 7.12 min encryption time and 6.51 min decryption time of the proposed model outperforms conventionalencryption algorithms.

7

Hadoop, narzędzie technologii Big Data i jego aplikacje

84%

Nowakowski K. , Nowakowski W.

Elektronika : konstrukcje, technologie, zastosowania

|

2016

|

tom Vol. 57, nr 3

33--36

PL

Big Data jest jednym z najważniejszych wyzwań współczesnej informatyki. Wobec zmasowanego napływu wielkich ilości informacji obecnych czasach pochodzących z różnych źródeł, konieczne jest wprowadzanie nowych technik analizy danych oraz rozwiązań technologicznych. Ważnym narzędziem w Big Data jest oprogramowanie Hadoop.

EN

Big Data is a term frequently used in the literature, but still there is no consensus in implementations of such environments. An important tool in Big data is a software Hadoop. The are many tools and technologies in this area. This paper is the review in the Big Data technologies.

8

Application of Big Data in Poland and in the world

84%

Woźniak A.

|

tom Vol. 37, nr 1

33--40

EN

The results of a survey conducted by the author are presented, in order to compare the Big Data tools currently used for the analysis of distributed data about the consumer between Polish and foreign companies, and to check what data is being analysed. Enterprises in Poland usually analyse data coming from their internal systems, while foreign companies examine data from mobile applications and geographical location.

PL

Zaprezentowano wyniki badania przeprowadzonego przez autorkę, które miało na celu m.in. porównanie stosowanych obecnie narzędzi Big Data do analizy rozproszonych danych o konsumencie pomiędzy polskimi i zagranicznymi firmami, sprawdzenie jakie dane są obecnie poddawane analizie. Firmy w Polsce najczęściej analizują dane pochodzące ze swoich systemów wewnętrznych, zagraniczne - dane pochodzące z aplikacji mobilnych i położenie geograficzne.

9

Big Data – definicje, wyzwania i technologie informatyczne

71%

Tabakow M. , Korczak J. , Franczyk B.

Informatyka Ekonomiczna

|

2014

|

nr 1(31)

138-153

XX

Big Data as a complex IT issues, is one of the most important challenges of the modern digital world. At the present time, the continuous inflow of a large amount of information from different sources, and thus with different characteristics, requires the introduction of new data analysis techniques and technology. In particular, Big Data requires the use of parallel processing and the departure from the classical scheme of data storage. Thus, in this paper we review the basic issues related to the theme of Big Data: different definitions of „Big Data” research and technological problems and challenges in terms of data volume, their diversity, the reduction of the dimension of data quality and inference capabilities. We also consider the future direction of work in the field of exploration of the possibilities of Big Data in various areas of management.

10

Internet jako pramen výzkumu: přístup k archivovaným webovým zdrojům a možnosti jejich zpracování

67%

Vozár Z. , Haškovcová M. , Prokopová A.

Teorie vědy (Theory of Science)

|

2022

|

tom 44

|

nr 1

59-87

EN

The Internet has become a natural communication platform for modern society. Web archives, which began in the 1990s to capture and preserve changing web content, have thus become key sources for research in the recent past. The analysis of their data is complicated by, for example, insuffi cient competencies of researchers, the need for computing resources or legislation. One way to meet the needs of users is to develop tools and research interfaces that allow to work with data without the need for technological knowledge of advanced extraction and thus open them to researchers. The study addresses the issue of access to archival web data, approaches eff orts to formulate a theoretical and methodological framework and proposes a design for access and further data processing. This design is applied in a unique research interface for extracting large data from web archives using advanced machine learning to generate and categorization of text outputs.

CS

Internet se stal přirozenou komunikační platformou soudobé společnosti. Webové archivy, které začaly vznikat v 90. letech 20. století s cílem zachytit a uchovat proměnlivý webový obsah, se tak staly klíčovými prameny pro výzkum nedávné minulosti. Analyzování jejich dat komplikují například nedostatečné kompetence badatelů, nutnost vybavení výkonnými výpočetními zdroji nebo legislativa. Jednou z cest, jak vyjít vstříc potřebám uživatelů, je vývoj nástrojů a výzkumných rozhraní, které umožňují práci s daty bez nutnosti technologických znalostí pokročilé extrakce a otevírají je tak k využití badatelům. Studie řeší problematiku zpřístupnění archivních webových dat, přibližuje snahy o formulování teoretického a metodologického rámce a navrhuje design pro přístup a pro další zpracování dat, který je aplikován v unikátním výzkumném rozhraní pro vytěžování velkých dat z webových archivů s využitím pokročilých postupů strojového zpracování pro generování a kategorizaci textových výstupů.

11

Massive simulations using MapReduce model

59%

Krupa A. , Sawicki B.

|

tom nr 4

45--47

EN

In the last few years cloud computing is growing as a dominant solution for large scale numerical problems. It is based on MapReduce programming model, which provides high scalability and flexibility, but also optimizes costs of computing infrastructure. This paper studies feasibility of MapReduce model for scientific problems consisting of many independent simulations. Experiment based on variability analysis for simple electromagnetic problem with over 10,000 scenarios proves that platform has nearly linear scalability with over 80% of theoretical maximum performance.

PL

W ostatnich latach chmury obliczeniowe stały się dominującym rozwiązaniem używanym do wielkoskalowych obliczeń numerycznych. Najczęściej są one oparte o programistyczny model MapReduce, który zapewnia wysoką skalowalność, elastyczność, oraz optymalizację kosztów infrastruktury. Artykuł w analityczny sposób przedstawia wykorzystanie MapReduce w rozwiązywaniu problemów naukowych złożonych z wielu niezależnych symulacji. Przeprowadzony eksperyment, złożony z ponad 10 000 przypadków, oparty o analizę zmienności pola elektromagnetycznego pokazuje niemal liniową skalowalność platformy i jej ponad 80% wydajności w stosunku do teoretycznego maksimum.

12

An algorithm for vehicle identification by on-board Bluetooth devices exploiting Big-Data tools

59%

Bazan M. , Janiczek T. , Kurda R. , Matusiak K. , Sak Ł.

Zeszyty Naukowe Wyższej Szkoły Technicznej w Katowicach

|

2017

|

tom nr 9

7--21

EN

Nowadays, vehicles are equipped with various on-board devices that work in Bluetooth technology and log on to the ITS infrastructure whenever passing by Bluetooth readers. The location of Bluetooth readers is an important issue for travel time prediction in urban areas. Bluetooth technology is used to enhance travel time prediction accuracy and is additional to vehicle license number identification. The algorithms for travel time prediction are used by such technologies e.g., TRAX to offer the road user an alternative route to traverse the most congested regions of the city in the most efficient way. In this paper we present the implementation of the algorithm that enables us to match Bluetooth on-board devices, and also cell phones that are mounted or are just in vehicles of road users. Since the ITS is a source of an enormous and increasing amount of data for this purpose we engage Big Data tools such as Apache HaDoop and Apache Spark. To build Map-Reduce tasks we use Hive-SQL. The algorithm is tested on ITS data from the city of Wroclaw. The results of the algorithm may be used to locate stolen vehicles.

PL

Współczesne pojazdy wyposażane są w wiele różnych urządzeń Bluetooth, które logują się do infrastruktury ITS za każdym razem gdy przejeżdżają one w zasięgu czytników Bluetooth. Położenie czytników Bluetooth jest zagadnieniem istotnym dla metod predykcji czasu przejazdu w regionach zurbanizowanych. Technologia Bluetooth jest użyta do poprawy dokładności czasu przejazdu i jest uzupełnieniem dla identyfikacji pojazdów po numerach rejestracyjnych. Algorytmy do predykcji czasu przejazdu są używane do proponowania użytkownikom trasy alternatywnej w celu przejazdu przez najbardziej zatłoczone regiony miasta w sposób najbardziej efektywny. W artykule jest prezentowana implementacja algorytmu, który pozwala połączyć urządzenia Bluetooth i telefony znajdujące się w pojazdach z samymi pojazdami. Do tego celu angażuje się narzędzia Big Data takie jak Apache HaDoop i Apache Spark. Do zbudowania zadań Map-Reduce używa się Hive-SQLa. Algorytm był testowany na danych z wrocławskiego ITS. Wyniki działania algorytmu mogą być użyte do lokalizowania skradzionych pojazdów.

13

A Contemplating approach for Hive and Map reduce for efficient Big Data Implementation

48%

Sasubilli G. , Sekhar U. S. , Sharma S. , Sharma S.

Annals of Computer Science and Information Systems

|

2018

|

tom Vol. 14

131--135

EN

In the reference current scenario, data is incremented exponentially and speed of data accruing at the rate of petabytes. Big data defines the available amount of data over the different media or wide communication media internet. Big Data term refers to the explosion in the quantity (and quality) of available and potentially relevant data. On the basis of quantity amount of data are very huge and this quantity has been handled by conventional database systems and data warehouses because the amount of data increases similarly complexity with it also increases. Multiple areas are involved in the production, generation, and implementation of Big Data such as news media, social networking sites, business applications, industrial community, and much more. Some parameters concern with the handling of Big Data like Efficient management, proper storage, availability, scalability, and processing. Thus to handle this big data, new techniques, tools, and architecture are required. In the present paper, we have discussed different technology available in the implementation and management of Big Data. This paper contemplates an approach formal tools and techniques used to solve the major difficulties with Big Data, This evaluate different industries data stock exchange to covariance factor and it tells the significance of data through covariance positive result using hive approach and also how much hive approach is efficient for that in the term of HDFS and hive query. and also evaluates the covariance factors after applying hive and map reduce approaches with stock exchange dataset of around 3500. After process data with the hive approach we have conclude that hive approach is better than map reduce and big table in terms of storage and processing of Big Data.

14

Efektywne przetwarzanie i integracja dużych zbiorów danych w środowisku Hadoop

48%

Drzymała P. , Welfle H. , Drzymała A.

|

2019

|

tom R. 95, nr 1

29--32

PL

Rozwój nowych kanałów elektronicznej wymiany informacji przyczynia się do powstania coraz większej ilości danych. Dane te są często zróżnicowane, niejednorodne i składowane bez ściśle zdefiniowanej struktury. W ciągu ostatnich 2 lat przyrosło 90% danych, jakie zostały wygenerowane od początku istnienia ludzkości. W artykule zaprezentowano architekturę i możliwości środowiska Hadoop powstałego w celu efektywnego przetwarzania i integracji dużych zbiorów danych. Przedstawiono cechy tej platformy oraz jej skalowalność. Omówiono metodę działania systemu plików HDFS oraz odporności na błędy składowania tego systemu. Zaprezentowano ideę współpracy węzłów klastra Hadoop oraz wykonywania działań typu Map – Reduce.

EN

The development of new channels of electronic information exchange contributes to the emergence of more and more data. These data are often diverse, heterogeneous and stored without a strictly defined structure. Over the past two years, 90% of the data has been generated since the beginning of human civilization. The article presents the architecture and possibilities of the Hadoop environment for the effective processing and integration of large data sets. It also presents the features of this platform and its scalability as well as discussed the method of operation in the HDFS file system and the resistance to storage errors of this system. The scheme of cooperation of the Hadoop cluster nodes to perform MapReduce operation was presented.