The advent of next generation sequencing has transformed our ability to generate genomic data. Today, cancer researchers have access to petabytes of multi-dimensional information from thousands of patients. However, analysis of this information only becomes more challenging as the amount of data continues to increase. This difficulty is exemplified when we consider data generated by the efforts of The Cancer Genomics Atlas (TCGA) network. Simply downloading the complete TCGA repository would require several weeks with a highly optimized network connection. Once downloaded, integrated analysis of this data remains out of reach for any researcher without access to the largest institutional compute clusters. The Cancer Genomics Cloud (CGC) Pilots project seeks to directly address these challenges by co-localizing data with the computational resources to analyze it, without the need to wait in a queue. The CGC will enable researchers to securely leverage the power of cloud computing to gain biologically relevant and actionable insights from massive public datasets including TCGA. Reproducible analysis of public (including controlled and open-access data) as well as private data can be performed using both application program and graphical user interfaces. The CGC features Rabix, a powerful implementation of the Common Workflow Language which enables developers to easily create and share portable and reproducible tools and pipelines. The CGC will be released to the community for evaluation and feedback by the end of 2015. Pre-register at cancergenomicscloud.org
Presented at the TCGA Conference, May 11 2015
This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400008C.
Cite this work
Researchers should cite this work as follows: