Skip to main content
The NCI Community Hub will be retiring in May 2024. For more information please visit the NCIHub Retirement Page
  • Discoverability Visible
  • Join Policy Open/Anyone
  • Created 08 Sep 2021

The information packet and video recording are now available.

Agenda - click the Sessions to view presentations

1:00 pm – 1:10 pm : Welcome and Introduction of speakers: NCI and FNL

1:10 pm – 1:45 pm :  Session I: What is Data Science? Mark Jensen, PhD, FNLCR BIDS

1:45 pm – 2:20 pm :  Session II: Data Science MethodologyRandall Johnson, PhD, FNLCR ABCS

2:20 pm – 2:45 pm : PANEL Discussion: Q & A for Sessions I and II

2:45 pm – 2:55 pm : BREAK

2:55 pm – 3:30 pm :  Session III: Formulating the QuestionMartin Skarzynski, PhD, NCI DCEG Fellow

3:30 pm – 4:15 pm :  Session IV: Data GatheringSimina Boca, PhD, Georgetown University Medical Center

4:15 pm - 4:50 pm : PANEL Discussion: Q & A for Sessions III and IV

4:50 pm – 5:00 pm : Wrap-Up: Feedback and Next Workshop

Topics: What is Data Science? Instructor: Mark Jensen, PhD, FNLCR BIDS

  • Definition
  • History

Data Science Methodology - Instructor: Randall Johnson, PhD, FNLCR ABCS

  • Methodology for cancer data science that includes knowledge of the problem, understanding the data, data preparation, etc.

Formulating the Question - Instructor: Martin Skarzynski, PhD, NCI DCEG Fellow

  • Understanding the research question
  • Analytic Approach: use cases on
  • Regression
  • Classification
  • Clustering

Data Gathering - Instructor: Simina Boca, PhD, Georgetown University Medical Center

Data requirements

  • Content, format, representation

Data collection

  • Data understanding
  • Descriptive statistics and visualization
  • Additional data collection
  • Data preparation
  • Importing data into Jupyter notebook using Pandas
  • Cleaning data and taking care of missing values in Python
  • Exploring Data
    a. using Pandas functions (.head, .shape, .tail, etc.)
    b. through visualization with Matplotlib
  • Determining the right algorithms to use for further Machine Learning (ML) analysis based on data characteristics

Date:               Tuesday, April 21, 2019
Time:               1:00-5:00 p.m.
Location:         NCI Shady Grove, Seminar 406 (Terrace East Building)

Questions? Contact the NCI Data Science Learning Exchange

Created by Clint Malone Last Modified Fri December 3, 2021 12:26 am by Clint Malone