Skip to main content
The NCI Community Hub will be retiring in spring 2024. For more information please visit the NCIHub Retirement Page

Topic: Application of Genomics Big Data on the Cancer Cloud: Making use of difficult data

Presenter: Dr. John Torcivia, Director of AI Deployment, Clarifai, Inc., Department of Biochemistry, George Washington University





In many applications in genomics, large data sets are created and lightly used before being shared with other researchers (ideally) or simply tossed away on hard drives. The Cancer Cloud project has enabled some of this very large data to be shared among qualified researchers in order to facilitate a greater understanding of oncogenesis. One issue that continuously comes up, however, is that simply using the data requires specialized skills outside of the biological realm. A blend of computer science and biology is required in order to properly be able to access and appropriately run computations on data as it gets too big to scale. This presentation goes over an application on the ISB Cancer Cloud where whole genome sequencing was used to generate variant calls for downstream research. Due to the size of the whole genome sequences, this was cost prohibitive to do it on lab computers and had to be done in the cloud. Also due to the size of the data, custom processes needed to be put into place to manage and queue the computations as well as to parallelize and reconstruct them properly. This workflow has been made available open source for adaptation to other pipelines and the WGS variant data is being made available to qualified researchers in the cancer cloud.

Created by Durga Addepalli Last Modified Mon November 16, 2020 10:13 am by Durga Addepalli