Skip to main content
The NCI Community Hub will be retiring in May 2024. For more information please visit the NCIHub Retirement Page:https://ncihub.cancer.gov/groups/ncihubshutdown/overview
close
  • Discoverability Visible
  • Join Policy Open/Anyone
  • Created 08 Sep 2021

The presentation and video recording are now available.

Overview: This two-part workshop will introduce you to the concepts and tools in Machine Learning to generate molecular descriptors for drug function classification. You will receive hands-on instruction to generate and explore small molecule (drug-like) chemical structures, compute chemical descriptors, and create and analyze Machine Learning classification models. The workshop will use open source chemoinformatics software and the scikit-learn library to compute key pharma-relevant descriptors and generate/analyze drug classification models.

Part 1: a 30-minute presentation followed by a 20-minute hands-on code/tools review. This includes:

  • Introduction to ML concepts to create molecular structures and extract features or chemical descriptors.
  • How to generate and analyze molecular fingerprint descriptors
  • How to use the following two tools to explore data (chemical) analysis and feature generation:
    • Rdkit libraries, Python’s open source cheminformatics software toolkit
    • Mordred and other open source software to generate molecular features

Part 2: a 30-minute presentation followed by a 20-minute hands-on tools review. We will extend the concepts demonstrated in Part 1 to build machine learning classification models for predicting small-molecule (drug-like) function (ex., CNS, GI Agent, etc.). Tools include:

  • Scikit-learn for creating Random Forest classification models
  • A modeling workflow that include data collection/curation, featurization (fingerprints), classification modeling using ensemble-based methods and analysis and based on the lessons-learned from AMPL publication

Date: Thursday, July 16, 2020
Time: 1:00 – 3:00 p.m.
Location: WebEx
Supporting Link: GitHub

Instructor: Sarangan Ravichandran, PhD, PMP [C], Data Scientist, Frederick National Laboratory for Cancer Research and Adjunct Professor in Bioinformatics, Hood College

Questions? Contact the NCI Data Science Learning Exchange

Created by Clint Malone Last Modified Tue October 26, 2021 10:04 pm by Clint Malone