Wyniki wyszukiwania - Biblioteka Nauki

1

KDS-transformation for Data Compression

100%

Popa I.

Fundamenta Informaticae

|

2005

|

tom Vol. 67, nr 4

371--375

EN

In this paper we present a new simple method to rewrite a given input string into another one which can be more efficiently compressed (in some cases) by traditional compression methods.

2

An efficient eigenspace updating scheme for high-dimensional systems

80%

Gangl S. , Mongus D. , Žalik B.

International Journal of Applied Mathematics and Computer Science

|

2014

|

tom 24

|

nr 1

123-131

EN

Systems based on principal component analysis have developed from exploratory data analysis in the past to current data processing applications which encode and decode vectors of data using a changing projection space (eigenspace). Linear systems, which need to be solved to obtain a constantly updated eigenspace, have increased significantly in their dimensions during this evolution. The basic scheme used for updating the eigenspace, however, has remained basically the same: (re)computing the eigenspace whenever the error exceeds a predefined threshold. In this paper we propose a computationally efficient eigenspace updating scheme, which specifically supports high-dimensional systems from any domain. The key principle is a prior selection of the vectors used to update the eigenspace in combination with an optimized eigenspace computation. The presented theoretical analysis proves the superior reconstruction capability of the introduced scheme, and further provides an estimate of the achievable compression ratios.

3

Wykorzystanie technologii CUDA do kompresji w czasie rzeczywistym danych pochodzących z sonarów wielowiązkowych

80%

Chybicki A. , Laskowski K. , Muszyński M.

Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne

|

2010

|

tom T. 19

127-132

PL

W pracy przedstawiono projekt oraz implementację systemu przeznaczonego do kompresji danych z sonarów wielowiązkowych działającego z wykorzystaniem technologii CUDA. Omówiono oraz zastosowano metody bezstratnej kompresji danych oraz techniki przetwarzania równoległego. Stworzoną aplikację przetestowano pod kątem szybkości i stopnia kompresji oraz porównano z innymi rozwiązaniami umożliwiającymi kompresję tego typu informacji.

EN

Recently, multibeam echosounders capable of logging, not only bathymetry focused data, but also the full water-column information have become available. Unlike using bathymetric multibeam sonars, which only capture the seafloor, utilizing full water-column multibeam systems capabilites results in acquiring very large data sets during hydrographic or scientific cruises. The paper presents the concept of algorithms dedicated for reduction of multibeam sonar datasets based on aplying multi-threaded architecture implemented in Graphical Processing Units (GPU). We presented the advantages of utilizing nVdia CUDA technology in the context of efficiency of compression and obtained data reduction ratio.

4

Discovery of significant intervals in time-series data

80%

Srinivasan A. , Chakravarthy S. , Sheshta S.

Foundations of Computing and Decision Sciences

|

2006

|

tom Vol. 31, No. 1

87-102

EN

This paper deals with time-series sensor data to discover significant intervals for prediction. In our project on intelligent home environment, we need to predict agent's actions (tum on/off the appliances) using previously collected sensor data. We propose an approach that replaces the time-series point values with time-series intervals, which represent the characteristics of the data. Time-series data is folded over a periodicity (day, week, etc.) to form intervals and significant intervals are discovered from them that satisfy minimum support and maximum interval-length criteria. By compressing the time-series data and working with intervals, discovery of significant intervals (for prediction) is efficient. Also, sequential mining algorithms perform better if they are modified to work with reduced dataset of significant intervals. In this paper, we present a suite of algorithms for detecting significant intervals, discuss their characteristics, advantages and disadvantages and analyze their performance.

5

PQ data compression algorithm with modified quantizer and adaptive band logic using DTCWT

80%

Prathibha E. , Manjunatha A. , Raj C. P.

Archives of Electrical Engineering

|

2018

|

tom Vol. 67, nr 1

207--223

EN

Abstract:With growing demand for energy, power generated in renewable sources at various locations are distributed throughout the power grid. The power grid known as the smart grid needs to monitor power generation and its smart distribution. Smart meters provide solutions for monitoring power over smart grids. Smart meters need to continuously log data and at every source there is a large amount of data generated that needs to be compressed for both storage and transmission over the smart grid. In this paper, a novel algorithm for PQ data compression is proposed that uses the Dual Tree Complex Wavelet Transform (DTCWT) for sub-band computation and a modified quantizer is designed to reduce subband coefficient limits to less than 4 bits. The Run Length Encoding (RLC) and Huffman Coding algorithm encode the data further to achieve compression. The performance metrics such as a peak-signal-to-noise ratio (PSNR) and compression ratio (CR) are used for evaluation and it is found that the modified DTCWT (MDTCWT) improves PSNR by a factor of 3% and the mean squared error (MSE) by a factor of 16% as compared with the DTCWT based PQ compression algorithm.

6

Optimal physical primaries of spectral color information

80%

Farajikhah S. , Madanchi F. , Amirshahi S. H.

Optica Applicata

|

2012

|

tom Vol. 42, nr 4

913--924

EN

A new method for selection of optimal physical primaries is introduced. The reflectance spectra of 1269 matt Munsell color chips are used as a universal physical dataset and the most independent samples are extracted and used as actual primaries. The efficiency of selected primaries is compared with those obtained from principal component analysis (PCA), non-negative matrix factorization (NNMF) and non-linear principal component analysis (NLPCA) techniques. The performances of chosen primaries are evaluated by calculation of root mean square (RMS) errors, the goodness of fit coefficient (GFC) and the color difference (ΔE) values between the original and the reconstructed spectra.

7

Google Books Ngrams Recompressed and Searchable

80%

Grabowski S. , Swacha J.

Foundations of Computing and Decision Sciences

|

2012

|

tom Vol. 37, No. 4

273--283

EN

One of the research fields significantly affected by the emergence of “big data” is computational linguistics. A prominent example of a large dataset targeting this domain is the collection of Google Books Ngrams, made freely available, for several languages, in July 2009. There are two problems with Google Books Ngrams the textual format (compressed with Deflate) in which they are distributed is highly inefficient we are not aware of any tool facilitating search over those data, apart from the Google viewer, which, as a Web tool, has seriously limited use. In this paper we present a simple preprocessing scheme for Google Books Ngrams, enabling also search for an arbitrary n gram (i.e., its associated statistics) in average time below 0.2 ms. The obtained compression ratio, with Deflate (zip) left as the backend coder, is over 3 times higher than in the original distribution.

8

PQ data compression algorithm with modified quantizer and adaptive band logic using DTCWT

70%

PRATHIBHA E. , MANJUNATHA A. , RAJ C. P.

|

nr 1

9

Encrypted prefix tree for pattern mining

70%

Bhattacharyya R. , Bhattacharyya B. , Chaudhuri A.

Foundations of Computing and Decision Sciences

|

2009

|

tom Vol. 34, No. 1

3-22

EN

Data influx at large volumes is welcome for quality outcome in knowledge discovery, but it causes concern for scalability of mining algorithms. We introduce three measures for scalable mining - bit-vector coding, data-partitioning and Transaction Prefix (TP)-tree. Following encryption with bit-vector coding, transaction records are partitioned with notion of common prefixes. A TP-tree structure is devised for arranging the data parts such that multiple records share common storage. Advantage is two-fold: additional storage reduction over bit-vector coding and mining common prefixes together. These altogether improve space-time requirement in frequent pattern mining. Experiments on dense datasets show significant improvements in performance and scalability of both candidate generation and pattern-growth algorithms.

10

Data Compressor for VQ Index Tables

70%

Lu T.-C. , Chang C. C.

Fundamenta Informaticae

|

2005

|

tom Vol. 65, nr 4

353--371

EN

The vector quantization (VQ) compression scheme has been well accepted as an efficient image compression technique. However, the compression bit rate of the VQ scheme is limited. In order to improve its efficiency, in this paper, we shall propose a new lossless data compression scheme to further condense the VQ index table. The proposed scheme exploits the inter-block correlations in the index table to re-encode the indices. Unlike the well known existing re-encoding schemes such as SOC and STC, the proposed scheme uses a smaller number of compression codes to encode every index that coincides with another on the predefined path. Compared with VQ, SOC and STC, the proposed scheme performs better in terms of compression bit rate.

11

Parallelization of an adaptive compression algorithm using the reduced model update frequency method

70%

Starosolski R.

Theoretical and Applied Informatics

|

2007

|

tom Vol. 19, No. 2

103-115

EN

Modeling and coding is the most complex part of many adaptive compression algorithms; it is also an inherently serial process. The method of reduced model update frequency is a modification of the typical adaptive scheme, which was originally employed in order to improve the speed of modeling at the cost of a negligible worsening of the modeling quality. In this paper, we notice that the above method permits to parallelize the compression process. We find that for the Itanium 2 processor the speed of coding and modeling in a SFALIC image compression algorithm may be improved by about 50% through exploiting the fine-grained parallelism, also the medium-grained parallelism may be exploited in a significantly larger extent.

PL

Modelowanie i kodowanie to najbardziej złożone elementy wielu adaptacyjnych algorytmów kompresji, przy czym sam proces kompresji oparty o modelowanie i kodowanie musi być realizowany w sposób sekwencyjny. Metoda zmniejszonej częstości aktualizacji modelu danych to modyfikacja zastosowana do adaptacyjnego algorytmu kompresji, aby poprawić prędkość modelowania kosztem nieznacznego, z praktycznego punktu widzenia, pogorszenia jakości modelowania. W niniejszej pracy zauważono, iż zastosowanie tej metody umożliwia zrównoleglenie algorytmu kompresji. Badania eksperymentalne wykazały, ˙ze dzięki wykorzystaniu równoległości drobnoziarnistej dla procesora Itanium 2 i algorytmu SFALIC metoda zmniejszonej częstości aktualizacji modelu danych pozwala na zwiększenie prędkości kodowania i modelowania o około 50%. Przeprowadzone szacunki pokazały, że metoda ta pozwala również na wykorzystanie równoległości średnioziarnistej w znacznie większym stopniu.

12

Realizacja kompresji danych metodą Huffmana z ograniczeniem długości słów kodowych

60%

Rybak K. , Jamro E. , Wiatr K.

Pomiary Automatyka Kontrola

|

2012

|

tom R. 58, nr 7

662-664

PL

Praca opisuje zmodyfikowany sposób budowania książki kodowej kodu Huffmana. Książka kodowa została zoptymalizowana pod kątem implementacji sprzętowej kodera i dekodera Huffmana w układach programowalnych FPGA. Opisano dynamiczną metodę kodowania - książka kodowa może się zmieniać w zależności od zmiennego formatu kompresowanych danych, ponadto musi być przesłana z kodera do dekodera. Sprzętowa implementacja kodeka Huffmana wymusza ograniczenie maksymalnej długości słowa, w przyjętym założeniu do 12 bitów, co pociąga za sobą konieczność modyfikacji algorytmu budowy drzewa Huffmana.

EN

This paper presents a modified algorithm for constructing Huffman codeword book. Huffman coder, decoder and histogram calculations are implemented in FPGA similarly like in [2, 3]. In order to reduce the hardware resources the maximum codeword is limited to 12 bit. It reduces insignificantly the compression ratio [2, 3]. The key problem solved in this paper is how to reduce the maximum codeword length while constructing the Huffman tree [1]. A standard solution is to use a prefix coding, like in the JPEG standard. In this paper alternative solutions are presented: modification of the histogram or modification of the Huffman tree. Modification of the histogram is based on incrementing (disrupting) the histogram values for an input codeword for which the codeword length is greater than 12 bit and then constructing the Huffman tree from the very beginning. Unfortunately, this algorithm is not deterministic, i.e. it is not known how much the histogram should be disrupted in order to obtain the maximum codeword length limited by 12 bit. Therefore several iterations might be required. Another solution is to modify the Huffman tree (see Fig. 2). This algorithm is more complicated (when designing), but its execution time is more deterministic. Implementation results (see Tab. 1) show that modifi-cation of the Huffman tree results in a slightly better compression ratio.

13

Implementacja w układach FPGA dekompresji danych zgodnie ze standardem Deflate

60%

Jamro E. , Wiatr K.

Pomiary Automatyka Kontrola

|

2013

|

tom R. 59, nr 8

739--741

PL

Otwarty standard kompresji danych, Deflate, jest szeroko stosowanym standardem w plikach .gz / .zip i stanowi kombinację kompresji metodą LZ77 / LZSS oraz kodowania Huffmana. Niniejszy artykuł opisuje implementację w układach FPGA dekompresji danych według tego standardu. Niniejszy moduł jest w stanie dokonać dekompresji co najmniej 1B na takt zegara, co przy zegarze 100MHz daje 100MB/s. Aby zwiększyć szybkość, możliwa jest praca wielu równoległych modułów dla różnych strumieni danych wejściowych.

EN

This paper describes FPGA implementation of the Deflate standard decoder. Deflate [1] is a commonly used compression standard employed e.g. in zip and gz files. It is based on dictionary compression (LZ77 / LZSS) [4] and Huffman coding [5]. The proposed Huffman decoded is similar to [9], nevertheless several improvements are proposed. Instead of employing barrel shifter a different translation function is proposed (see Tab. 1). This is a very important modification as the barrel shifter is a part of the time-critical feedback loop (see Fig. 1). Besides, the Deflate standard specifies extra bits, which causes that a single input word might be up to 15+13=28 bits wide, but this width is very rare. Consequently, as the input buffer might not feed the decoder width such wide input date, a conditional decoding is proposed, for which the validity of the input data is checked after decoding the input symbol, thus when the actual input symbol bit widths is known. The implementation results (Tab. 2) show that the occupied hardware resources are mostly defined by the number of BRAM modules, which are mostly required by the 32kB dictionary memory. For example, comparable logic (LUT / FF) resources to the Deflate standard decoder are required by the AXI DMA module which transfers data to / from the decoder.

14

Klasyczna i neuronowa analiza głównych składowych na przykładzie zadania kompresji obrazu

60%

Bartecki K.

Pomiary Automatyka Kontrola

|

2013

|

tom R. 59, nr 1

34--37

PL

W artykule omówiono zastosowanie analizy składników głównych (PCA) w zadaniu kompresji stratnej sygnału na przykładzie kompresji obrazu. Zadanie zrealizowano z wykorzystaniem klasycznej metody PCA oraz dwóch rodzajów sieci neuronowych: jednokierunkowej, dwuwarstwowej sieci z uczeniem nadzorowanym i jednowarstwowej sieci z uczeniem nienadzorowanym. W każdym z przypadków przeanalizowano wpływ struktury modelu PCA na wartości współczynnika kompresji oraz średniokwadratowego błędu kompresji.

EN

In the paper, lossy data compression techniques based on the principal component analysis (PCA) are considered on the example of image compression. The presented task is performed using the classical PCA method based on the eigen-decomposition of the image covari-ance matrix as well as two different kinds of artificial neural networks. The first neural structure used is a two-layer feed-forward network with supervised learning shown in Fig.1, while the second one is a single-layered network with unsupervised Hebbian learning. In each case considered, the effect of the PCA model structure on the data compression ratio and the mean square reconstruction error is analysed. The compression results for a Hebbian neural network with K=4 PCA units are presented in Figs. 2, 3 and 4. They show that only 4 eigenvectors are able to capture the main features of the processed image, giving as a result high value of the data compression ratio. However, the reconstructed image quality is not sufficient from a practical point of view. Therefore, selection of the appropriate value for K should take into account the tradeoff between a sufficiently high value for the compression ratio and a reasonably low value for the image reconstruction error. The summary results for both classical and neural PCA compression approaches obtained for different number of eigenvectors (neurons) are compared in Fig. 5. The author concludes that a positive aspect of using neural networks as a tool for extracting principal components from the image data is that they do not require calculating the correlation matrix explicitly, as in the case of the classical PCA-based approach.

15

Average convergence rate of the first return time

60%

Choe G. H. , Kim D. H.

Colloquium Mathematicum

|

2000

|

tom 84/85

|

nr 1

159-171

EN

The convergence rate of the expectation of the logarithm of the first return time $R_{n}$, after being properly normalized, is investigated for ergodic Markov chains. I. Kontoyiannis showed that for any β > 0 we have $log[R_{n}(x)P_{n}(x)] =o(n^{β})$ a.s. for aperiodic cases and A. J. Wyner proved that for any ε >0 we have $-(1 + ε)log n ≤ log[R_{n}(x)P_{n}(x)] ≤ loglog n$ eventually, a.s., where $P_{n}(x)$ is the probability of the initial n-block in x. In this paper we prove that $ E[log R_{(L,S)} - (L-1)h]$ converges to a constant depending only on the process where $R_{(L,S)}$ is the modified first return time with block length L and gap size S. In the last section a formula is proposed for measuring entropy sharply; it may detect periodicity of the process.

16

Autoasocjacyjna sieć neuronowa jako narzędzie do nieliniowej kompresji danych

60%

Boniecki P. , Przybył J.

Journal of Research and Applications in Agricultural Engineering

|

2006

|

tom Vol. 51, nr 1

37-40

PL

Sieci autoasocjacyjne to sieci, które odtwarzają wartości wejściowe na swoich wyjściach. Działanie takie zdecydowanie ma sens, ponieważ rozważana sieć autoasocjacyjna posiada w warstwie środkowej (ukrytej) zdecydowanie mniejszą liczbą neuronów niż w warstwie wejściowej czy wyjściowej. Dzięki takiej budowie dane wejściowe muszą przecisnąć się przez swojego rodzaju zwężenie w warstwie ukrytej sieci, kierując się w do wyjścia. Dlatego też, w celu realizacji stawianego jej zadania reprodukcji informacji wejściowej na wyjściu, sieć musi się najpierw nauczyć reprezentacji obszernych danych wejściowych za pomocą mniejszej liczby sygnałów produkowanych przez neurony warstwy ukrytej, a potem musi opanować umiejętność rekonstrukcji pełnych danych wejściowych z tej "skompresowanej" informacji. Oznacza to, że sieć autoasocjacyjna w trakcie uczenia zdobywa umiejętność redukcji wymiaru wejściowych danych.

EN

An autoassociative network is one which reproduces its inputs as outputs. Autoassociative networks have at least one hidden layer with less units than the input and output layers (which obviously have the same number of layers as each other). Hence, autoassociative networks perform some sort of dimensionality reduction or compression on the cases. Dimensionality reduction can be used to pre-process the input data to encode Information in a smaller number of variables. This approach recognizes that the intrinsic dimensionality of the data may be lower than the number of variables. In other words, the data can be adequately described by a smaller number of variables, if the right transformation can be found.

17

Eliminating Switching Components in Binary Matrices by 0-1 Flips and Column Permutations

60%

Hantos N. , Balázs P.

Fundamenta Informaticae

|

2015

|

tom Vol. 141, nr 2/3

135--150

EN

Analysis of patterns in binary matrices plays a vital role in numerous applications of computer science. One of the most essential patterns of such matrices are the so called switching components, where the number and location of the components gives valuable information about the binary matrix. One way to measure the effect of switching components in a binary matrix is counting the number of 0-s which have to be replaced with 1-s in order to eliminate the switching components. However, finding the minimal number of 0-1 flips is generally an NP-complete problem. We present two novel-type heuristics for the above problem and show via experiments that they outperform the formerly proposed ones, both in optimality and in running time. We also show how to use those heuristics for determining the so-called nestedness level of a matrix, and how to use the flips for binary image compression.

18

Symulacja przebiegu drgań fundamentu budynku wzbudzanych wstrząsami górniczymi z zastosowaniem SSN

60%

Kuźniar K. , Maciąg E.

Czasopismo Techniczne. Budownictwo

|

2007

|

tom R. 104, z. 2-B

83-90

PL

Głównym celem pracy jest określenie przebiegów przyspieszeń drgań fundamentu budynku na podstawie pomierzonych przyspieszeń drgań gruntu. Pomiary drgań na gruncie i fundamencie przeprowadzone in situ posłużyły jako wzorce w próbie zastosowania techniki neuronowej do tego celu. Dane doświadczalne wstępnie przetworzono, dokonując ich kompresji poprzez dekompozycję według składników głównych.

EN

The main goal of this paper is the simulation of building foundation vibrations on the basis of ground vibrations taken from measurements. Using the results from measurements in situ on the ground and on the foundation, the neural technique is applied. The experimental data were pre-processed (compressed) with the application of the Principal Component Analysis.

19

An efficient eigenspace updating scheme for high-dimensional systems

60%

Gangl S. , Mongus D. , Žalik B.

International Journal of Applied Mathematics and Computer Science

|

2014

|

tom Vol. 24, no. 1

123--131

EN

Systems based on principal component analysis have developed from exploratory data analysis in the past to current data processing applications which encode and decode vectors of data using a changing projection space (eigenspace). Linear systems, which need to be solved to obtain a constantly updated eigenspace, have increased significantly in their dimensions during this evolution. The basic scheme used for updating the eigenspace, however, has remained basically the same: (re)computing the eigenspace whenever the error exceeds a predefined threshold. In this paper we propose a computationally efficient eigenspace updating scheme, which specifically supports high-dimensional systems from any domain. The key principle is a prior selection of the vectors used to update the eigenspace in combination with an optimized eigenspace computation. The presented theoretical analysis proves the superior reconstruction capability of the introduced scheme, and further provides an estimate of the achievable compression ratios.

20

Efficient Approaches to Compute Longest Previous Non-overlapping Factor Array

60%

Chairungsee S.

Fundamenta Informaticae

|

2018

|

tom Vol. 163, nr 3

291--304

EN

In this article, we introduce new methods to compute the Longest Previous nonoverlapping Factor (LPnF) table. The LPnF table is the table that stores the maximal length of factors re-occurring at each position of a string without overlapping and this table is related to Ziv-Lempel factorization of a text which is useful for text compression and data compression. The LPnF table has the important role for data compression, string algorithms and computational biology. In this paper, we present three approaches to produce the LPnF table of a string from its augmented position heap, from its position heap, and from its suffix heap. We also present the experimental results from these three solutions. The algorithms run in linear time with linear memory space.