OP-5 Predicting Post-Synaptic Activity in Proteins with Data Mining
Alex A. Freitas (1), Anthony Baines (1), Gisele Pappa (1)
1) University of Kent
The bioinformatics problem being addressed in this paper is to predict whether or not a protein has post-synaptic activity. This problem is of great intrinsic interest because proteins with postsynaptic activities are connected with functioning of the nervous system. Indeed, many proteins having post-synaptic activity have been functionally characterized by biochemical, immunological and proteomic exercises. They represent a wide variety of proteins with functions in extracellular signal reception and propagation through intracellular apparatuses, cell adhesion molecules and scaffolding proteins that link them in a web. The challenge is to automatically discover features of proteins' primary sequences that typically occur in proteins with post-synaptic activity but rarely (or never) occur in proteins without post-synaptic activity, and vice-versa. In this context, we used data mining to automatically discover classification rules that predict whether or not a protein has post-synaptic activity. The discovered rules were analyzed with respect to their predictive accuracy (generalization ability) and with respect to their interestingness to biologists (in the sense of representing novel, unexpected knowledge).
OP-6 DPDB: a database for the storage, representation and analysis of polymorphism in the Drosophila genus
Sònia Casillas (1), Natalia Petit (1), Antonio Barbadilla (1)
1) Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Bellaterra (Barcelona).
Motivation: Polymorphism studies are one of the main research areas of this genomic era. To date, however, no comprehensive secondary databases have been designed to provide searchable collections of polymorphic sequences with their associated diversity measures.
Results: We define a data model for the storage, representation and analysis of genotypic and haplotypic data. Under this model we have created DPDB, Drosophila Polymorphism Database, a web site that provides a daily updated repository of all wellannotated polymorphic sequences in the Drosophila genus. It allows the search for any polymorphic set according to different parameter values of nucleotide diversity, linkage disequilibrium and codon bias. For data collection, analysis and updating we use PDA, a pipeline that automates the process of sequence retrieving, grouping, aligning and estimation of nucleotide diversity from Genbank sequences in different functional regions. The web site also includes analysis tools for sequence comparison and the estimation of genetic diversity, a page with real-time statistics of the database contents, a help section and a collection of selected links. DPDB is freely available at http://dpdb.uab.es and can be downloaded via FTP.
OP-7 A Query Language for Biological Networks Ulf Leser, Humboldt-Universität zu Berlin
Motivation: Many areas of modern biology are concerned with the management, storage, visualization, comparison, and analysis of networks, but yet no appropriate query language for such complex data structures exists. Results: We have designed and implemented the pathway query language (PQL) for querying large protein interaction or pathway databases. PQL is based on a simple graph data model with extensions reflecting properties of biological objects. Queries match subgraphs in the database based on node properties and paths between nodes. The syntax is easy to learn for anybody familiar with SQL. As an important feature, a query may require a certain structure in the database to exist, but return a different subgraph. We have tested PQL queries on networks of up to 16.000 nodes and found it to scale very well.
OP-8 SIMAP - The similarity matrix of proteins
Roland Arnold (1), Thomas Rattei (2), Patrick Tischler (1), Volker Stümpflen (1), Werner Mewes (1)
1) Institute for Bioinformatics, GSF-National Research Center for Environment and Health, Ingolstädter Landstr. 1, 85764 Neuherberg, 2) Department of Genome Oriented Bioinformatics, Technical University of Munich, Wissenschaftszentrum Weihenstephan, 85350 Freising
Motivation: Sequence similarity searches are of great importance in bioinformatics. Exhaustive searches for homologous proteins in databases are computationally expensive and may be replaced by a database of pre-calculated homologies in many cases. Retrieving similarities from an incrementally updated database instead of repeatedly recalculating them should provide homologs much faster and frees computational resources for other purposes.
Results: We have implemented SIMAP - a database containing the similarity space formed by about all amino-acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and allows incremental updates. We implemented a powerful backbone for similarity computation, which is based on FASTA heuristics. By providing WWW interfaces as well as web-services, we make our data accessible to the world-wide community. We also adapted procedures to detect putative orthologs as example applications. Availibility.The SIMAP portal page providing links to SIMAP services is publicly available: http://mips.gsf.de/services/analyses/simap/ The web-services can be accessed under: http://mips.gsf.de/proj/hobitws/services/RPCSimapService?wsdl and http://mips.gsf.de/proj/hobitws/services/DocSimapService?wsdl
OP-9 Design of a Description Language for Generating Wrapper to Collect Biological Data
Myungeun Lim (1), Myungguen Chung (1), Myungnam Bae (1), Sunhee Park (1).
1) Electronics and Telecommunications Research Institute
As biological data are scattered in various area with various format and they are changing continuously, data integration becomes an important issue to provide researcher a dynamic access of the data. In the data integration process, the method to extract heterogeneous data dynamically from the data source is an essential part. Data extraction method using wrapper can provide flexibility and extensibility to the integration system.
OP-10 Adding Some SPICE to DAS
Andreas Prlic (1), Thomas Down (1), Tim Hubbard (1)
1) The Wellcome Trust Sanger Institute.
The Distributed Annotation System (DAS) defines a communication protocol used to exchange biological annotations. It is motivated by the idea that annotations should not be provided by single centralized databases, but instead be spread over multiple sites. Data distribution, performed by DAS servers, is separated from visualization, which is done by DAS clients. The original DAS protocol was designed to serve annotation of genomic sequences. We have extended the protocol to be applicable to macromolecular structures. Here we present SPICE, a new DAS client that can be used to visualize protein sequence and structure annotations.