Research

Our research interests are in applying and developing novel graph-theoretic/statistical/machine learning techniques for solving problems in computational biology. These techniques can provide an answer to many challenges in computational biology because they offer a natural way to integrate different types of data and to handle large amounts of noisy information.

Our research has mainly focused on four areas:

  1. Inference and analysis of large-scale Protein-Protein Interaction networks.
  2. Protein Function Prediction.
  3. Inferring relationships between Genotype, Phenotype and Environment.
  4. Analysis of Biological Processes from co-expression networks.

Inference and analysis of large-scale protein-protein interaction networks

Top

Proteins carry out their molecular functions by interacting with other molecules, mainly other proteins. For this reason protein interactions provide an important step toward understanding protein function and cell behaviour. Systematically mapping the set of all protein-protein interactions within an organism – the interactome – has therefore become a major challenge in post-genomic biology. Recent developments in experimental procedures (e.g. co-affinity purification followed by mass spectrometry, AP-MS) have resulted in the publication of many high-quality protein-protein interaction datasets for different organisms ranging from the yeast Saccharomyces cerevisiae to Homo sapiens.

An interactome has a natural representation as an undirected graph, often called protein-protein interaction (PPI) network, where nodes represent proteins and edges represent interactions between pairs of proteins. Often an estimation of the reliability of such interactions is available and is included as edge labels (weights). Interactomes have a modular structure, meaning that there are sets of proteins that interact with each other more frequently than with the rest of the network. These densely connected regions are typically interpreted as protein complexes, and their identification is crucial to deepen our understanding of cellular processes. The problem of identifying protein complexes from PPI data is then equivalent to detecting dense regions containing many connections in PPI networks (or regions with large weights if the networks are weighted).

In our lab research on large scale PPI networks has been funded by the BBSRC (grant BB/F00964X/1) and the Royal Society (grant NF080750). We have worked on methods for:

Protein Function Prediction

Top

In recent years, the numerous large scale sequencing projects have generated enormous amounts of sequence data. This has led to the identification of thousands of previously unknown genes whose function awaits to be characterized. A precise definition of protein function is difficult, as in general the meaning of the term “function” depends on the context which one is considering. The current dominant solution to this problem is through the use of ontologies, consisting of terms in a controlled vocabulary organized in a hierarchical structure through a set of well-defined relationships.

Standard ontologies usually have a structure that can be modeled by a rooted and oriented tree or, more generally, by a directed acyclic graph, like the Gene Ontology, which is becoming the standard. Having defined function through ontologies, even for the best characterized model organisms, about a third of the proteins have unknown function. A fundamental goal is therefore to identify the function of uncharacterized genes on a genomic scale. It is difficult to design functional assays for uncharacterized genes so a major challenge in bioinformatics is to devise algorithmic methods that, given a gene, can predict a hypothesis for its function that can then be validated experimentally.

In our lab research in protein function prediction has been funded by the BBSRC (grant BB/F00964X/1). We have worked on methods for:

Inferring relationships between genotype, phenotype and environment

Top

An important problem in biology is to uncover the links between the genetic makeup of an organism (genotype) and its observable physical or biochemical characteristics (phenotype). For example, this would increase our ability to rapidly characterize an unknown microorganism, which is critical in both responding to infectious disease and biodefense. To do this, we need some way of anticipating an organism’s phenotype based on the molecules encoded by its genome.

At the same time, by what means specific sequences link distinct environmental conditions with specific biological processes is also not well understood. Thus, another important challenge is how the usage of particular pathways and subnetworks reflects the adaptation of microbial communities across environments and habitats – i.e., how network dynamics relates to environmental features. We have worked on methods for:

Analysis and detection of biological processes from co-expression networks

Top

Gene expression experiments measure the activity of thousands of genes in response to different conditions. Generally, genes involved in a particular biological mechanism tend to exhibit similar expression patterns and form groups. An important question in this area is that of detecting from transcriptomics data which biological processes are activated in a given condition.

Another problem is that of selecting marker genes which can represent such specific mechanisms. In fact these markers can be used as readouts and help understanding the mechanisms, monitor the interactions between them and track the physiological effect they may exert. For example, as yeast cells grow, genes involved in various hormone pathways exhibit distinct similarity in expression patterns and form groups. Sensitive and specific markers which can track and report the dynamics of each group are important for investigating the mechanisms of response to each hormone, cross-talk between hormone pathways and the relationship between hormones and phenotypic effects.

In our lab research for the analysis of transcriptomics data has been funded by the BBSRC (grant BB/F00964X/1) and Royal Holloway, through the Agnes Grace Ellen Endowment. We have developed methods for: