Skip to main content

Presentation 2022: American Association for Cancer Research

Poster Presentation: Tools for collecting pathologist annotations and understanding interobserver variability



  • Katherine N. Elfer,
  • Kim Blenman,
  • Sarah N. Dudgeon,
  • Victor Garcia,
  • Anna Ehinger,
  • Xiaoxian Li,
  • Amy Ly,
  • Dieter Peeters,
  • Bruce Werness,
  • Matthew Hanna,
  • Roberto Salgado,
  • Brandon D. Gallas.


Background: The High Throughput Truthing (HTT) project is assessing pathologist agreement estimates of stromal tumor-infiltrating lymphocytes (sTILs) density in hematoxylin and eosin (H&E) stained breast cancer biopsy slides. The HTT project will create a validation dataset for artificial intelligence and machine learning (AI/ML) algorithms in digital pathology fit for a training, proficiency testing, and regulatory purpose.

Methods: The pilot study crowdsourced pathologists to estimate sTIL density in 640 regions of interest (ROIs) across 64 slides via two modalities: an optical microscope (eeDAP) and two digital platforms (caMicroscope and PathPresenter). eeDAP is a hardware-software interface that presents the observer with pre-defined fields of view on H&E slides that corresponds to the ROI on a whole slide image. The PathPresenter and caMicroscope web-applications replicate the eeDAP workflow on the whole slide image without microscope hardware. In the workflow, pathologists evaluated the eligibility of an ROI for sTILs content then estimated the densities of tumor-associated stroma and sTILs in the ROI. Inter-pathologist agreement within ROIs was characterized with the root mean-squared difference. Using 72 of the highest variability ROIs selected from the pilot study, seven practicing pathologists participated in a subsequent focus group to improve the clinical training and data-collection workflows.

Results: The pilot study collected 7,373 sTIL density estimates from 35 pathologists between February 2020 and May 2021. The focus group provided an additional 411 evaluations on 72 ROIs and in-depth discussions to identify pitfalls, gaps in training, and workflow improvements. Installation of eeDAP for physical data collection guided improvements in documentation and operation capabilities. Updated training materials refine the definition of tumor-associated stroma, provide reference images to differentiate sTILs from other cell types, and provide feedback during training. Digital and microscope platforms benefitted from enforcing registration and training, standardizing workflows, and accelerating eeDAP slide-image registration.

Conclusions: The slides, images, and annotations provided by volunteer collaborators and participants for our pilot study led to improvements in data collection tools and crowdsourcing workflows to ensure consistency and minimize annotation variability. Our pilot dataset and analysis methods are available on a public HTT Github repository to allow open access to our methodology and feedback from the digital pathology and statistics communities. These data-collection and analysis methods are applicable to other quantitative biomarkers for validation of AI/ML algorithms. The lessons learned from this work will be applied to the HTT pivotal study and inform future quality data-collection methods of pathologist annotations.

Created on , Last modified on