Scientific Computing Team

The Scientific Computing core facility (SciCom) supports translational cancer research by providing storage and compute services for scientists and core facilities at the Cancer Research UK Manchester Institute.

Our goal is to find and provide the computing tools and resources our scientists need to carry out their outstanding cancer research. For this, SciCom operates a highly integrated data analysis platform, consisting of a High-Performance-Computing system, a Linux virtualisation platform, bare metal servers and cloud services. The platform aims to cover the entire data analysis lifecycle from data generation, processing and analysis to publication and archiving. It allows the secure processing and analysis of sensitive high-throughput data. In addition, we offer application and software development support to enable our scientists to use the latest Bioinformatics methods and technologies for their research.

The Institute operates a multitude of cutting edge instruments for conducting genomics, proteomic and imaging experiments. The large amounts of data produced by these instruments are stored on a 2.2PB storage system and can be analysed using an oVirt-based virtualisation platform or “Phoenix”. Phoenix is a heterogenous 1,500 core High Performance Compute Linux cluster, consisting of standard, high memory and GPU nodes, with a 1PB parallel file system. With its tightly integrated hardware and cloud infrastructure, SciCom operates Linux and Windows-based high throughput data analysis and management services.

Submitting MPI jobs to Phoenix using RStudio Server Pro

Combining virtual machines hosted on powerful computing hardware with remote visualisation also allows interactive data processing of compute, data, and memory intensive workloads, which is especially helpful for processing proteomics data. Special data protection arrangements on Phoenix allow the processing and analysis of access-controlled (e.g. dbGaP, ICGC, etc.) and clinical trial data.

Scientific Computing has a strong focus on automating the processing of data to increase throughput and accelerate and ease the burden of data analysis for the scientists. This is achieved by using common workflow management systems for creating standardised bioinformatic analysis pipelines, LIMS integration, and the introduction of new technologies like machine learning for the automated analysis of image and CyTOF data.