Name		Name	Last commit message	Last commit date
parent directory ..
2vcf.py		2vcf.py
README.md		README.md
allele_conflicts.bash		allele_conflicts.bash
cgrep.bash		cgrep.bash
classify.awk		classify.awk
classify.bash		classify.bash
fillna.bash		fillna.bash
filter.bash		filter.bash
importance_plot.py		importance_plot.py
metrics.py		metrics.py
metrics_table.py		metrics_table.py
norm_nums.awk		norm_nums.awk
prc.py		prc.py
predict_RF.R		predict_RF.R
roc.py		roc.py
statistics.py		statistics.py
train_RF.R		train_RF.R
tune_plot.R		tune_plot.R

scripts

This directory contains various scripts used by the pipeline. However, you can use most of these scripts on their own, too. Some may even be helpful in day-to-day use.

All python scripts implement the --help argument. For bash, R, and awk scripts, you can run head <script> to read about their usage.

A python script that uses files from the prepare and classify pipelines to create a VCF with the final, predicted variants. This script also has a special internal mode, which can be used for recalibrating the QUAL scores output in the VCF.

allele_conflicts.bash

A bash script for identifying sites at which the variant callers in our ensemble outputted conflicting alleles.

cgrep.bash

A bash script for extracting columns from TSVs via grep. Every argument besides the first is passed directly to grep.

classify.awk

A fast awk script for classifying each site in a VCF as DEL, INS, SNP, etc. It accepts a two column table (REF and ALT) from the VCF.

classify.bash

A bash script for converting all REF/ALT columns in a TSV to binary positive/negative labels using classify.awk.

fillna.bash

A bash script for replacing NA values in a large TSV.

filter.bash

A bash script for filtering rows from a large TSV by specific columns.

importance_plot.py

A python script for creating plots of the importance of each variable (ie feature) outputted by each variant caller.

metrics.py

A python script for calculating evaluation metrics on a two column TSV of binary labels: truth and predictions.

metrics_table.py

A python script for summarizing multiple metrics files output by metrics.py in a nicely formatted table.

norm_nums.awk

A fast awk script for ensuring that unusual numerical values in a large TSV can be read by R.

prc.py

A python script for creating precision-recall plots. It takes as input the output of metrics.py and/or statistics.py.

predict_RF.R

An R script for predicting variants using a trained classifier. It takes as input a model generated by train_RF.R.

roc.py

A python script for creating ROC plots. It takes as input the output of statistics.py.

statistics.py

A python script for generating points to use in a precision-recall or ROC curve. It takes as input a two column TSV: true labels and prediction p-values.

train_RF.R

An R script for creating a trained classifier. We recommend using the Snakefile-classify pipeline to run this script.

tune_plot.R

An R script for visualizing the results of hyperparameter tuning from the train_RF.R script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

README.md

scripts

2vcf.py

allele_conflicts.bash

cgrep.bash

classify.awk

classify.bash

fillna.bash

filter.bash

importance_plot.py

metrics.py

metrics_table.py

norm_nums.awk

prc.py

predict_RF.R

roc.py

statistics.py

train_RF.R

tune_plot.R

Files

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

scripts