DiProGB: The Dinucleotide Properties Genome Browser
The dinucleotide properties genome browser is part of my PhD project.
During the last 10 years, a large number of complete genomes has been sequenced. Having these data at hand,
the basic aim is now to convert this information into biological knowledge. This requires the identification of biologically meaningful motifs in genomic data.
Computational motif discovery has been used with some success in simple organisms such as yeast, for example. For higher organisms with more complex genomes more sensitive methods are required.
There is also a growing awareness that not single motifs but motif combinations usually called modules may be relevant to biological function.
In this project we developed a new type of genome browser that offers user-friendly genome analysis tools for the statistical analysis of single and multiple sequences
as well as for the visual exploration of single sequences. A peculiarity is that not only the standard sequence representation in terms of the bases A, T, G and C can be adopted,
but also a reduced sequence representation by purine/pyrimidine and AT/GC characteristics and finally a representation in terms of a large number of dinucleotide parameters that
can encode geometrical information on DNA structure, for example. All of these coding schemes can be converted into a signal representation that allows for a very effective
visual motif discovery. Analyses can be performed for the + and – as well as for the double strand. Combining these sequence- and signal-based representations offers a new approach
for the detection of new regulatory elements. The functionalities described make DiProGB a unique tool for the identification and analysis of functional motifs in genomes.
From the algorithmic point of view standard sequence-based algorithms are combined with signal-based pattern recognition algorithms.
DiProGB is a standalone computer program written in VC++. It has been optimized to cope with large genomes. The program has been developed
under the Microsoft Windows operating system. It can, however, also be used under Linux, Mac, BSD, and Solaris after installing the
program WineHQ (http://winehq.org), for example.
A more detailed description and the freely available program can be found at http://diprogb.fli-leibniz.de.
DiProGB is published in
Bioinformatics 2009; doi: 10.1093/bioinformatics/btp436
|