Recent Progress 2015

The RNA Biology Group is interested in how cells regulate gene expression, and how these processes are disrupted in tumours. We have a particular focus on noncoding RNAs (ncRNAs). These are transcripts that are expressed, but never translated into proteins. Recent work investigating individual loci has shown that ncRNAs can be functional in their own right, raising the prospect that they provide an additional layer of cellular machinery that acts alongside protein-based mechanisms, and suggesting that they may act as significant players in the processes that regulate tumours.

The RNA Biology group is highly interdisciplinary, and comprises a mix of computer scientists, computational- and bench- biologists. We make extensive use of deep sequencing and mass spectrometry to explore changes in gene expression in both clinical and in vitro derived datasets, using computational approaches to identify candidate regulatory interactions involving important cancer-associated genes. We then pursue individual loci identified from these analyses at the bench using conventional molecular biology. We make heavy use of high performance computing in order to analyse the high volumes of genomics data generated from our deep sequencing and mass spectrometry platforms and write novel software tools to support data analysis, much of which we contribute to R/Bioconductor.

A global non-coding RNA system modulates fission yeast protein levels in response to stress
Fission yeast is a model eukaryote in which many regulatory pathways are conserved with human cells. These include the RNAi machinery, MAPK signalling pathways, and chromatin modifiers. This conservation with human cells makes it an excellent system in which to explore the basic biology of regulatory noncoding RNAs. In previous studies we demonstrated that adjacent gene-pairs that overlap at their 3´ end can co-regulate through the generation of sense-antisense pairs, and that this is used to control the expression of the MAPK Spk1, a key kinase on the pheromone signalling cascade. This process was dependent on components of the RNAi machinery. These initial observations also raised the question as to whether these mechanisms might be employed more generally across the genome.

Since we had observed substantial expression of ncRNAs and antisense transcripts that were under the control of the ATF/CREB family transcription factor Atf21, we speculated that they might act to regulate protein expression in a stress-dependent manner. In a collaboration with the Cell Regulation group, we therefore combined tandem mass spectrometry with RNA sequencing to examine transcript and protein expression through a timecourse of osmotic stress. We found substantial changes in sense/antisense pairs and multiple loci where the 3´ transcriptional extent of the gene was altered in a stress dependent way. By integrating these RNA sequencing data with label free quantitative proteomics data, we were able to identify a global system in which changes to ncRNAs and 3´ overlaps between adjacent genes were associated with modulations in the corresponding protein levels, and to identify ncRNAs with expression patterns strongly correlated to those of functionally related sets of protein coding genes (Figure 1; Leong et al., Nature Communications 2014). These data are interesting because they indicate that noncoding RNAs are an important component of the stress response.

Figure 1.
Columns represent individual ncRNAs changing over a time course of osmotic stress. Rows represent functionally related sets of protein coding genes, as defined by GOSlim category. Cells are coloured when the expression profile of a noncoding RNA was highly correlated (purple), or anti-correlated (green), with a significant number of members of the given protein family. 

The stress response is complex, temporally staged, and modulated according to the level of stress, and can include changes within the cell that prepare them for future insults that often occur following the current condition. Overall this requires substantial regulatory control, and alterations to gene expression must be achieved within physiological timescales. The ubiquity of noncoding RNAs, combined with the speed at which cells can effect a change in transcript levels, suggests that ncRNAs have the potential to add substantial information processing capacity to the signaling systems within a cell. Since many of the pathways involved are conserved between fission yeast and humans, it is likely that similar processes will occur in higher eukaryotes. Work in the group is currently underway to explore these phenomena in human cells.

Computational Analysis
While less than 2% of the genome encodes proteins, as much as 90% is now known to be transcriptionally active. Our work in fission yeast has demonstrated that the boundaries of the expressed portion of a gene can change according to context. In addition, numerous noncoding loci are present that have yet to find their way into the reference genome databases, while in previous work we were able to identify hundreds of novel protein coding genes by analysing tandem mass spectrometry data to identify peptides originating from unannotated regions of the genome. Together, these data indicate the potential to miss important features within the data if analysis is restricted only to those parts of the dataset that map directly to a static representation of the consensus genome annotation. We therefore perform de novo re-annotation of the genome for each individual sample in an RNA sequencing experiment, and have developed computational strategies in R/Bioconductor to facilitate this.

A further challenge with deep sequencing data is that the ability to routinely profile every single nucleotide of the diploid genome across hundreds of samples generates large datasets that demand substantial computational power to analyse.

We are also interested in the consequences of mutations on the proteome and have been collaborating with the Signalling Networks in Cancer Group to identify regions of the exome that have been missed by existing sequencing studies. We identified hundreds of DNA ‘blind spots’ that have the potential to harbour cancer-causing mutations that have yet to be detected by current approaches.

Computational Biology Support
Bioinformatics analysts:            
Yaoyong Li
Hui Sun Leong
Chris Wirth

A separate computational biology support team provides access to pre-processing and data analysis expertise across the Institute. This includes genome aligners, annotation databases and analysis software. In addition, the team provides analytical support to numerous groups within the CRUK MI, and acts as a hub around which informaticians from other groups can embed. The team works closely with the Molecular Biology Core Facility and Scientific Computing Teams led by Wei Xing  to ensure the timely and efficient analysis of deep sequencing data. The majority of the group’s work is performed in R/Bioconductor.