Wyniki wyszukiwania - BazTech

1

A note on the complexity of scheduling of communication-aware directed acyclic graph

Musial J., Guzek M., Bouvry P., Blazewicz J.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2018

|

Vol. 66, nr 2

187--191

EN

The most recent incarnation of distributed paradigm is cloud computing. It can be seen as the first widely accepted business model of mass consumption of the distributed computing resources. Despite the differences in business models and technical details regarding cloud platforms, the distributed computing underlies cloud. Communications in cloud systems include transmissions of the results of cloud applications, users interactions, and exchange of data between different services that compose applications. The latter becomes more critical as applications become richer as well as more complex, and may consist of services operated by various providers. The effective communication between components of cloud systems is thus critical to the end user satisfaction and to the success of cloud services. We will discuss different cloud computing models (communication aware and unaware). Main focus will be placed on communication-aware directed acyclic graph (CA-DAG), which extends the classical DAG model by explicitly modeling communication tasks. Moreover, we will analyze and consult computational complexity of this innovative distributed computation model inspired by the characteristics of cloud computing. Providing a proof of strong NP-hardness of the problem allows for a future implementation and evolution of the communication-aware DAG models.

2

Tabu search for the RNA partial degradation problem

Rybarczyk A., Hertz A., Kasprzak M., Blazewicz J.

International Journal of Applied Mathematics and Computer Science

|

2017

|

Vol. 27, no. 2

401--415

EN

In recent years, a growing interest has been observed in research on RNA (ribonucleic acid), primarily due to the discovery of the role of RNA molecules in biological systems. They not only serve as templates in protein synthesis or as adapters in the translation process, but also influence and are involved in the regulation of gene expression. The RNA degradation process is now heavily studied as a potential source of such riboregulators. In this paper, we consider the so-called RNA partial degradation problem (RNA PDP). By solving this combinatorial problem, one can reconstruct a given RNA molecule, having as input the results of the biochemical analysis of its degradation, which possibly contain errors (false negatives or false positives). From the computational point of view the RNA PDP is strongly NP-hard. Hence, there is a need for developing algorithms that construct good suboptimal solutions. We propose a heuristic approach, in which two tabu search algorithms cooperate, in order to reconstruct an RNA molecule. Computational tests clearly demonstrate that the proposed approach fits well the biological problem and allows to achieve near-optimal results. The algorithm is freely available at http://www.cs.put.poznan.pl/arybarczyk/tabusearch.php.

3

Exact and heuristic approaches to solve the Internet shopping optimization problem with delivery costs

Lopez-Loces M. C., Musial J., Pecero J. E., Fraire-Huacuja H. J., Blazewicz J., Bouvry P.

International Journal of Applied Mathematics and Computer Science

|

2016

|

Vol. 26, no. 2

391--406

EN

Internet shopping has been one of the most common online activities, carried out by millions of users every day. As the number of available offers grows, the difficulty in getting the best one among all the shops increases as well. In this paper we propose an integer linear programming (ILP) model and two heuristic solutions, the MinMin algorithm and the cellular processing algorithm, to tackle the Internet shopping optimization problem with delivery costs. The obtained results improve those achieved by the state-of-the-art heuristics, and for small real case scenarios ILP delivers exact solutions in a reasonable amount of time.

4

G-MAPSEQ – a new method for mapping reads to a reference genome

Wojciechowski P., Frohmberg W., Kierzynka M., Zurkowski P., Blazewicz J.

Foundations of Computing and Decision Sciences

|

2016

|

Vol. 41, No. 2

123--142

EN

The problem of reads mapping to a reference genome is one of the most essential problems in modern computational biology. The most popular algorithms used to solve this problem are based on the Burrows-Wheeler transform and the FM-index. However, this causes some issues with highly mutated sequences due to a limited number of mutations allowed. G-MAPSEQ is a novel, hybrid algorithm combining two interesting methods: alignment-free sequence comparison and an ultra fast sequence alignment. The former is a fast heuristic algorithm which uses k-mer characteristics of nucleotide sequences to ﬁnd potential mapping places. The latter is a very fast GPU implementation of sequence alignment used to verify the correctness of these mapping positions. The source code of G-MAPSEQ along with other bioinformatic software is available at: http://gpualign.cs.put.poznan.pl.

5

Algorithms solving the Internet shopping optimization problem with price discounts

Musial J., Pecero J. E., Lopez-Loces M. C., Fraire-Huacuja H. J., Bouvry P., Blazewicz J.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2016

|

Vol. 64, nr 3

505--516

EN

The Internet shopping optimization problem arises when a customer aims to purchase a list of goods from a set of web-stores with a minimum total cost. This problem is NP-hard in the strong sense. We are interested in solving the Internet shopping optimization problem with additional delivery costs associated to the web-stores where the goods are bought. It is of interest to extend the model including price discounts of goods. The aim of this paper is to present a set of optimization algorithms to solve the problem. Our purpose is to find a compromise solution between computational time and results close to the optimum value. The performance of the set of algorithms is evaluated through simulations using real world data collected from 32 web-stores. The quality of the results provided by the set of algorithms is compared to the optimal solutions for small-size instances of the problem. The optimization algorithms are also evaluated regarding scalability when the size of the instances increases. The set of results revealed that the algorithms are able to compute good quality solutions close to the optimum in a reasonable time with very good scalability demonstrating their practicability.

6

Building the library of RNA 3D nucleotide conformations using the clustering approach

Zok T., Antczak M., Riedel M., Nebel D., Villmann T., Lukasiak P., Blazewicz J., Szachniuk M.

International Journal of Applied Mathematics and Computer Science

|

2015

|

Vol. 25, no. 3

689--700

EN

An increasing number of known RNA 3D structures contributes to the recognition of various RNA families and identification of their features. These tasks are based on an analysis of RNA conformations conducted at different levels of detail. On the other hand, the knowledge of native nucleotide conformations is crucial for structure prediction and understanding of RNA folding. However, this knowledge is stored in structural databases in a rather distributed form. Therefore, only automated methods for sampling the space of RNA structures can reveal plausible conformational representatives useful for further analysis. Here, we present a machine learning-based approach to inspect the dataset of RNA three-dimensional structures and to create a library of nucleotide conformers. A median neural gas algorithm is applied to cluster nucleotide structures upon their trigonometric description. The clustering procedure is two-stage: (i) backbone- and (ii) ribose-driven. We show the resulting library that contains RNA nucleotide representatives over the entire data, and we evaluate its quality by computing normal distribution measures and average RMSD between data points as well as the prototype within each cluster.

7

DNA sequence assembly involving an acyclic graph model

Blazewicz J., Frohmberg W., Gawron P., Kasprzak M., Kierzynka M., Swiercz A, Wojciechowski P.

Foundations of Computing and Decision Sciences

|

2013

|

Vol. 38, No. 1

25--34

EN

The problem of DNA sequence assembly is well known for its high complexity. Experimental errors of different kinds present in data and huge sizes of the problem instances make this problem very hard to solve. In order to deal with such data, advanced efficient heuristics must be constructed. Here, we propose a new approach to the sequence assembly problem, modeled as the problem of searching for paths in an acyclic digraph. Since the graph representing an assembly instance is not acyclic in general, it is heuristically transformed into the acyclic form. This approach reduces the time of computations significantly and allows to maintain high quality of produced solutions.

8

G-DNA – a highly efficient multi-GPU/MPI tool for aligning nucleotide reads

Frohmberg W., Kierzynka M., Blazewicz J., Gawron P., Wojciechowski P.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2013

|

Vol. 61, nr 4

989--992

EN

DNA/RNA sequencing has recently become a primary way researchers generate biological data for further analysis. Assembling algorithms are an integral part of this process. However, some of them require pairwise alignment to be applied to a great deal of reads. Although several efficient alignment tools have been released over the past few years, including those taking advantage of GPUs (Graphics Processing Units), none of them directly targets high-throughput sequencing data. As a result, a need arose to create software that could handle such data as effectively as possible. G-DNA (GPU-based DNA aligner) is the first highly parallel solution that has been optimized to process nucleotide reads (DNA/RNA) from modern sequencing machines. Results show that the software reaches up to 89 GCUPS (Giga Cell Updates Per Second) on a single GPU and as a result it is the fastest tool in its class. Moreover, it scales up well on multiple GPUs systems, including MPI-based computational clusters, where its performance is counted in TCUPS (Tera CUPS).

9

Reduced-by-matching Graphs: Toward Simplifying Hamiltonian Circuit Problem

Blazewicz J., Kasprzak M.

Fundamenta Informaticae

|

2012

|

Vol. 118, nr 3

225-244

EN

The results presented in the paper are threefold. Firstly, a new class of reduced-by-matching directed graphs is defined and its properties studied. The graphs are output from the algorithm which, for a given 1-graph, removes arcs which are unnecessary from the point of view of searching for a Hamiltonian circuit. In the best case, the graph is reduced to a quasi-adjoint graph, what results in polynomial-time solution of the Hamiltonian circuit problem. Secondly, the systematization of several classes of digraphs, known from the literature and referring to directed line graphs, is provided together with the proof of its correctness. Finally, computational experiments are presented in order to verify the effectiveness of the reduction algorithm.

10

Complexity Issues in Computational Biology

Blazewicz J., Kasprzak M.

Fundamenta Informaticae

|

2012

|

Vol. 118, nr 4

385-401

EN

The progress of research in the area of computational biology, visible in last decades, brought, among others, a new insight into the complexity issues. The latter, previously studied mainly on the ground of computer science or operational research, gained by a confrontation with problems from the new area. In the paper, several complexity issues inspired by computational biology are presented.

11

G-PAS 2.0 - an improved version of protein alignment tool with an efficient backtracking routine on multiple GPUs

Frohmberg W., Kierzynka M., Blazewicz J., Wojciechowski P.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2012

|

Vol. 60, nr 3

491-494

EN

Several highly efficient alignment tools have been released over the past few years, including those taking advantage of GPUs (Graphics Processing Units). G-PAS (GPU-based Pairwise Alignment Software) was one of them, however, with a couple of interesting features that made it unique. Nevertheless, in order to adapt it to a new computational architecture some changes had to be introduced. In this paper we present G-PAS 2.0 - a new version of the software for performing high-throughput alignment. Results show, that the new version is faster nearly by a fourth on the same hardware, reaching over 20 GCUPS (Giga Cell Updates Per Second).

12

DomAns - pattern based method for protein domain boundaries prediction and analysis

Antczak M., Blazewicz J., Lukasiak P., Milostan M., Krasnogor N., Palik G.

Foundations of Computing and Decision Sciences

|

2011

|

Vol. 36, No. 1

35-56

EN

Determination of the native folded structure for a particular protein is a milestone towards understanding its function, and in most cases, can be done experimentally. However, the ability to predict in silico protein structure and related features would represent a fundamental breakthough in structural biology. The ability to predict domains in proteins is amongst the most important tasks needed for efective functional classification, homology-based structure prediction, structural genomics, as it makes function prediction easier. In this paper, we present the DomAnS, protein domain prediction approach, that is based on pattern alignment. DomAnS allows rapid screening for potential domain regions with the ability to recognize the most promising regions where domains might exists. The combination of the DomAnS algorithm with specialized databases that contains all known domains, allows us to find domain regions without solving 3D structure. Our approach has been tested on CASP7 data, and for 28 targets gave the best overall score.

13

Sequence similarity based method for protein function prediction

Blazewicz J., Lukasiak P., Milostan M., Krasnogor N., Jackowiak P.

Foundations of Computing and Decision Sciences

|

2009

|

Vol. 34, No. 3

173-191

EN

Motivation: Proteins are the main building blocks of life. They catalyze biological processes in living cells to sustain life and improve metabolism. They also act as biological scaffolds and are cell's workhorses. As a matter of fact, knowing their function is one of the most important milestones for understanding life.The function depends on the tertiary structure of the protein, but only for a fraction of amino acid sequences gathered in databases the structure is known. Thus, creation of efficient and accurate methods that predict function from sequences, based on already known function-sequence assignments, is a fundamental challenge in computational biology. Results: First, we show a detailed analysis of a usability of similarity search engines in the context of function prediction. Then we propose a simple and effective method for assigning function to sequences based on the results of similarity searches and information gathered from gene ontology annotation graphs. Availability: All data used for the analysis presented in this paper as well as raw result are available at the site: http://bio.cs.put.poznan.pl/funcpred/data/ Suplementary Material: Suplementary materials with additional charts are available at: http://bio.es.put.poznan.pl/funcpred/suplement/ Contact: protbio@cs.put.poznan.pl