Mining Structured Data in Bioinformatics
Prof. Stefan Kramer

The topic of the tutorial is mining in structured data. This is particularly relevant for data mining applications in bioinformatics, since the majority of biological data is not kept in databases consisting of a single, flat table. Instead, we are frequently dealing with databases of structured and linked objects. In other words, the “objects” in bioinformatics databases often have a rich internal structure and are connected by some relation. (Consider, for instance, databases of proteins, small molecules, metabolic and regulatory networks, text databases, etc.) The tutorial will give an overview of data mining techniques for sequences, trees, graphs and relational databases. We will present techniques for both descriptive and predictive data mining in this context. In descriptive mining, we are looking for local patterns to characterize the data. In predictive mining, we are looking for models that can be used to make predictions for new, unseen cases. Along the two dimensions (types of data and predictive/descriptive), the tutorial is organized as follows: the first two parts of the tutorial are devoted to descriptive mining in databases of itemsets, strings and sequences, trees, graphs and relational databases. The third part of the tutorial deals with predictive mining based on propositionalization (i.e., feature construction using patterns), instance-based learning and kernel methods for graph and relational databases.



GEPAS: New challenges in microarray data analysis
Dr. Joaquin Dopazo

DNA microarray technology is an essential tool for studying biological processes at the genomic level. Nevertheless, with the advent of genome-wide methodologies, new challenges have arisen related to the analysis of huge amounts of data being produced. Important topics in microarray data analysis include: data processing, normalization and data transformation procedures; clustering; supervised classification and gene selection; and functional annotation. The proposed tutorial will address these issues by combining theoretical lectures, with practical sessions. The practical sessions will ultimately demonstrate the use of our web-based suite of tools, GEPAS, for microarray data analysis.



InterPro, exploring a powerful protein diagnostic tool
Dr. Jennifer McDowall

InterPro is an integrated protein resource that provides protein annotation and classification at family and domain levels. InterPro combines the major signature databases, PROSITE, PRINTS, PFAM, PRODOM, SMART, TIGRFAM, PIR Superfamily, GENE3D, SUPERFAMILY, and PANTHER, as well as structural information from PDB, MSD, CATH, SCOP and SWISS-MODEL, into a unified database. This tutorial is designed to allow users to get the most value out of the database, and will focus on the type and organisation of annotated data, the different query methods possible, understanding the different visualisations of the data, as well as exploring the multiple external links and cross-references available.



Computational proteomics
Prof. Colinge Jacques

Proteomics has become an important approach to analyze biological samples and it extensively uses mass spectrometry to identify and characterize proteins. This tutorial will introduce the audience to the central problem of searching mass spectrometry data against a database of proteins. This presentation should stimulate the interest of bioinformatics researchers in other fields and provide a concise though accessible introduction to life scientists. The last part of the tutorial will rapidly cover other important problems in mass spectrometry data analysis such as peptide de novo sequencing, eukaryote genome search and protein quantification and characterization.


Developed by SoftActiva