Welcome to the tangle-tracer repository! This repo is associated with the study Ghandian et al. 2024. The repo offers a few key utilities:
- Reproducibility of the above study
- A pipeline to transform point annotations done on AT8-stained WSIs to segmentation ground truth masks
- UNet model weights which can be fine-tuned to segment neurofibrillary tangles (NFTs) in novel whole slide pathology images (WSIs)
- YOLOv8 model weights which can be fine-tuned to detect NFTs in WSIs
Refer to the docstring at the bottom of each script to see an example of how to run it from the CLI.
If you encounter any issues or have any questions, feel free to open a GitHub issue and we will do our best to address it.
- Run
conda create --name tangle_tracer_env python=3.9.7 pytorch=1.12.0 torchvision=0.13.0 cudatoolkit=11.3
to create the environment with a viable python/pytorch version. - Run
pip install -r REQUIREMENTS.txt
- Run
python setup.py build | python setup.py install
to allow for the internal import structure to work as intended.
- Download WSIs (.czi) and Annotation Metadata (.cz) to local file system (ideally NVMe drive for fastest read/write) and convert to Zarr arrays at full resolution (0.11 microns/px). See an example of these files in
tangle-tracer/data/wsis/czi/
andtangle-tracer/data/annotations/czi/
.- Convert WSIs to Zarr files with
annot_conversion/czi_to_zarr.py
.
- Convert WSIs to Zarr files with
- (Optional) Create a parser for annotations and metadata. This step requires the most effort on the users' end.
- Add parser to
WSIAnnotation.py
and testWSIAnnotation
creation. - You can display the result of the point-to-mask procedure by calling
WSIAnnotation.plot_bboxes()
on a singleWSIAnnotation()
.
- Add parser to
- Run
parallel_annots.py
to create allWSIAnnotation
objects using parallelized dask workers. Alternatively, you can use the--input_type=rois
flag to generateROIAnnotation
objects instead.- This may be more useful for applying the model to your own novel dataset. Creating either object also saves each ROI according to the associated WSI metadata in a separate directory of your choosing (i.e.
save_path
arg in both constructors). - Both objects are also saved via pickle for easy access of all relevant information regarding a specific WSI/ROI (e.g. references to ROI zarr images, coordinates and bboxes of annotated NFTs, masks generated from DAB channel of each NFT, etc.) at
tangle-tracer/data/wsiAnnots/
ortangle-tracer/data/roiAnnots/
. - A
WSIAnnotation
orROIAnnotation
can be loaded in withWSIAnnotation.load_annot(wsi_name, load_path='tangle-tracer/data/wsiAnnots/', scale)
.
- This may be more useful for applying the model to your own novel dataset. Creating either object also saves each ROI according to the associated WSI metadata in a separate directory of your choosing (i.e.
- Run
mask_stitching.py
to process allWSIAnnotation
objects and their respective ROIs. The goal of this script is to stitch together a ground truth ROI mask for each ROI referenced within eachWSIAnnotation
.- The ground truth masks are saved in a separate directory (default =
data_dir/annotations/
) - These are randomly sampled from via a weighted random sampler during segmentation model training. Positive and negative labels (indicating presence of an NFT within that tile) are attached to non-overlapping tile coordinates during creation of
ROIDataModule
.
- The ground truth masks are saved in a separate directory (default =
- (Optional) Run
train.py
with desired parameters to train a new model. Thedata_dir
is the samedata_dir
defined formask_stitching.py
.- Logs are dumped to a locataion of your choosing. Default location is
data_dir.parent/output/logs/
. Keep in mind that this is where checkpoints will be saved, so if this is altered, pass in the correct checkpoint directory toinference.py
.
- Logs are dumped to a locataion of your choosing. Default location is
- Run
inference.py
on a novel WSI to generate a WSI-level segmentation output mask.- Running it without defining a table_path will pass in the default path to the
tangle_tracer/data/tables/study_split.pickle
, allowing you to define which subset of the data to run inference on (i.e., train, val, test, batch1, batch2, or all). - You can also pass in a custom
table_path
to run on a completely different set of WSIs, and filter this further with the--wsis
parameter.
- Running it without defining a table_path will pass in the default path to the
- Run
object_detection/create_tiles_yolo.py
with the samedata_dir
as created in the earlier preprocessing pipeline. This will create a set of tiles split by train/val/test insave_dir
. - Set the
path
parameter inyolo.yaml
to be thesave_dir
defined in the prior step. Other paths should not need to be changed as they are relative to this one. - (Optional) Check whether you can successfully create a
YOLOModule
andYOLODataModule
by running the appropriate tests (currently inyolo_datasets.py
andmodules_yolo.py
). - (Optional) To get object detection metrics for the trained segmentation model, you can run
object_detection/yolo_datasets.py
. Results will be saved in csv files withintangle-tracer/data/tables/segobjdetect_results/
. Might want to add these results to .gitignore if you don't want to check it in to your repo.
- (Optional) Run
object_detection/train_yolo.py
to train a YOLOv8 model from scratch. Can also continue training or fine-tune a model by passing in ackpt_path
(e.g.,/tangle-tracer/data/weights/yolov8/best.pt
).- If the
config_path
is not recognized, define theconfig_path
to be the location of theyolo.yaml
file which had itspath
variable modified. - Can pass in a
hyperparams_path
which points to a yaml containing a configuration of YOLOv8 augmentation parameters. We have included abest_hyperparams.yaml
file with the results of our best hyperparameter tuning run on this dataset attangle-tracer/data/tables/best_hyperparameters.yaml
. - Can run hyperparameter tuning as well with the
--tune
parameter.
- If the
- Run
object_detection/wsi_inference_yolo.py
. It largely has the same structure asinference.py
, but runs with an ultralytics' trainer, so it does not have the same multiprocessing capabilities that the pytorch-lightning trainer offers with its parallel worker write capabilities. As a result, this is not quite as fast it could be, but still runs ~60% faster than the UNet model at about 20 minutes per WSI per GPU. - (Optional) Run
object_detection/roi_inference_yolo.py
.- Runs inference on the saved dataset of tiles only (i.e., train, val, test).
- Outputs a set of directories in
save_path
corresponding to each WSI, with each folder containing zarr arrays for each ROI associated with that WSI. The zarr file is an object detection agreement map, with blue bboxes representing ground truth NFTs, and yellow bboxes representing predicted NFTs.
- Run
scoring/calc_tissue_area.py
to calculate tissue pixel areas using histomicsTK. - Run
scoring/count_nfts.py
to count NFTs on a set of WSI-level segmentations. This is essentially a low-resolution blob detection done via cv2. - Run
scoring/count_bboxes.py
to parse the output .txt files from a WSI-level object detection inference procedure (i.e., outputs from runningobject_detection/wsi_inference_yolo.py
).