Detail publikace

Podvzorkování prokaryotických DNA signálů pro rychlo celogenomovou klasifikaci

SEDLÁŘ, K. ŠKUTKOVÁ, H. VÍTEK, M. PROVAZNÍK, I.

Originální název

Prokaryotic DNA Signal Downsampling for Fast Whole Genome Comparison

Český název

Podvzorkování prokaryotických DNA signálů pro rychlo celogenomovou klasifikaci

Anglický název

Prokaryotic DNA Signal Downsampling for Fast Whole Genome Comparison

Typ

článek v časopise

Jazyk

en

Originální abstrakt

Classification of prokaryotes is mainly based on molecular data, since next-generation sequencing platforms provide fast and effective way to capture prokaryotes characteristics. However, two different bacterial strains of the same genus can differ in the specific parts of their genomes due to copious amounts of repetitive and transposable parts. Thus, finding an ideal segment of genome for comparison is difficult. Conventional character-based methods rely on multiple sequence alignment, rendering them extremely computationally demanding. Only small parts of genomes can be compared in reasonable time. In this paper, we present a novel algorithm based on the conversion of the whole genome sequences to cumulative phase signals. Dyadic wavelet transform (DWT) is used for lossy compression of phase signals by eliminating redundant frequency bands. Signal classification is then performed as cluster analysis using Euclidean metrics where sequence alignment is replaced by dynamic time warping (DTW).

Český abstrakt

Klasifikace prokaryot je založena hlavně na molekulárních znacích, díky technikám next-generation sekvenování, které jsou rychlým a efektivním nástrojem pro zachycení charakteristik prokaryot. Genomy dvou rozdílných bakteriálních kmenů stejného druhu se ale mohou lišit ve specifických částech genomu kvůli množství repetitivních segmentů. Najít ideální segment pro klasifikaci celého genomu je tak velmi problematické. Konvenční znakové metody jsou založené na vícenásobném zarovnání, což je dělá extrémně výpočetně náročnými. Mohou tak porovnávat pouze krátké úseky genomu. V tomto článku prezentujeme nový algoritmus založený na konverzi sekvence celého genomu do podoby genomického signálu kumulativní fáze. Dydická vlnková transformace je použita pro kompresi těchto signálů pomocí eliminace redundantních frekvenčních pásem. Signálová klasifikace je pak provedena jako shluková analýza založená na Euklidovské metrice, kde vícenásobné zarovnání je nahrazeno dynamickým borcením časové osy.

Anglický abstrakt

Classification of prokaryotes is mainly based on molecular data, since next-generation sequencing platforms provide fast and effective way to capture prokaryotes characteristics. However, two different bacterial strains of the same genus can differ in the specific parts of their genomes due to copious amounts of repetitive and transposable parts. Thus, finding an ideal segment of genome for comparison is difficult. Conventional character-based methods rely on multiple sequence alignment, rendering them extremely computationally demanding. Only small parts of genomes can be compared in reasonable time. In this paper, we present a novel algorithm based on the conversion of the whole genome sequences to cumulative phase signals. Dyadic wavelet transform (DWT) is used for lossy compression of phase signals by eliminating redundant frequency bands. Signal classification is then performed as cluster analysis using Euclidean metrics where sequence alignment is replaced by dynamic time warping (DTW).

Klíčová slova

prokaryota, genomický signál, kumulovaná fáze, komprese, klasifikace, dwt, dtw

Rok RIV

2014

Vydáno

01.06.2014

Nakladatel

Springer International Publishing

Místo

Německo

Strany od

373

Strany do

383

Strany počet

11

BibTex


@article{BUT107893,
  author="Karel {Sedlář} and Helena {Škutková} and Martin {Vítek} and Ivo {Provazník}",
  title="Prokaryotic DNA Signal Downsampling for Fast Whole Genome Comparison",
  annote="Classification of prokaryotes is mainly based on molecular data, since next-generation sequencing platforms provide fast and effective way to capture prokaryotes characteristics. However, two different bacterial strains of the same genus can differ in the specific parts of their genomes due to copious amounts of repetitive and transposable parts. Thus, finding an ideal segment of genome for comparison is difficult. Conventional character-based methods rely on multiple sequence alignment, rendering them extremely computationally demanding. Only small parts of genomes can be compared in reasonable time. In this paper, we present a novel algorithm based on the conversion of the whole genome sequences to cumulative phase signals. Dyadic wavelet transform (DWT) is used for lossy compression of phase signals by eliminating redundant frequency bands. Signal classification is then performed as cluster analysis using Euclidean metrics where sequence alignment is replaced by dynamic time warping (DTW).",
  address="Springer International Publishing",
  chapter="107893",
  doi="10.1007/978-3-319-06593-9_33",
  institution="Springer International Publishing",
  number="6",
  volume="283",
  year="2014",
  month="june",
  pages="373--383",
  publisher="Springer International Publishing",
  type="journal article"
}