Skip to main content
The NCI Community Hub will be retiring in May 2024. For more information please visit the NCIHub Retirement Page

October 11th, 2019 

Topic: Reproducible data analysis with Snakemake

Presenter: Johannes Köster, Ph.D, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen





Data analyses usually entail the application of many command line tools or scripts to transform, filter, aggregate or plot data and results. With ever increasing amounts of data being collected in science, reproducible and scalable automatic workflow management becomes increasingly important. Snakemake is a workflow management system, consisting of a clean, human-readable, text-based workflow specification language and a scalable execution environment, that allows the parallelized execution of workflows on workstations, compute servers, clusters and the cloud without modification of the workflow definition. Since its publication, Snakemake has been widely adopted and was used to build analysis workflows for numerous high impact publications. With about thousands of homepage visits per month and over 100 new citations in 2019, it has a large and stable user community. This talk will show how Snakemake can be used to easily document, execute, and reproduce data analyses.

Created by Durga Addepalli Last Modified Wed November 4, 2020 11:04 pm by Durga Addepalli