Skip to content

A local application frontend and a backend server based on U-Net and Dectectron2 as a solution to the auto annotation of pathology images (Columbia Data Science Institute Fall 2020 Capstone Project)

License

Notifications You must be signed in to change notification settings

alexliyihao/AAPI_code

Repository files navigation

Auto Annotation of Pathology Images

Columbia Data Science Institute Capstone Project, Fall 2020

Mentor: Dr. Adler Perotte

Instructor: Dr. Adam S. Kelleher

Team member:

Yihao Li, Chao Huang, Yufeng Ma, Xiaoyun Zhu, Shuo Yang

This project aims to create a machine learning-driven user interface for the annotation of very large pathology images. Each image may be 10s of thousands by 10s of thousands of pixels. As a result, annotation of the entire slide for object recognition or semantic/instance segmentation can be time consuming when entities are only a few pixels in diameter. This project aims to build a framework for maximally leveraging expert annotator (clinician) time by interleaving annotation (label generation) with inference to provide an intuitive notion of model fit and the minimal amount of labeling required for acceptable model performance.

Project Final Report

The final report for this project can be seen from: Final Report

Video Demonstration

A video presentation with slides can be found on Youtube via https://youtu.be/XTHRxxOoG-k.

Installation

  1. Required packages can be found in the requirements file, it's recommended to use a virtual environment to install all required packages through pip.
  2. Note that although detectron2 is used in this repository, it's NOT explicitly listed in the requirements due to its complex dependencies on the version of PyTorch and CUDA. Therefore, it's better to build it from source by following the official guide.

Repository Structure

  1. Collage Generator: the module for generating synthetic whole slide images (a.k.a, collages) from vignettes, which utilize a complex algorithm. The algorithm is fully described and explained in the sub-directory called illustration.

  2. Vignettes Data: contains vignettes used for generating synthetic whole slide images.

  3. COCO-Format Converter: the module for generating instance segmentation datasets from collages using COCO-compatible format.

  4. Core ML Components: the module storing essential functions and tools for training and serving UNet models for segmentation.

    • preprocessing: contains functions for the preprocessing pipeline, namely cropping images as patches, saving patches as HDF5 files and loading data as PyTorch Datasets with augmentations.
    • modeling: contains UNet model architecture, which is wrapped as a PyTorch Lightning model. Also, essential functions for postprocessing are also provided.
    • utils: contains essential utility functions for manipulating slides and annotations.
    • api: high level APIs exposed for the model serving component.
    • config: a configuration file denoting target classes and parameters for the segmentation task.
  5. Scripts: contains useful scripts for tuning (using Optuna) and testing models. Can also be used as a reference for calling low-level functions.

  6. Demo Notebooks: contains several useful demo notebooks showing the usage of core components. =======

About

A local application frontend and a backend server based on U-Net and Dectectron2 as a solution to the auto annotation of pathology images (Columbia Data Science Institute Fall 2020 Capstone Project)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages