Skip to content

Latest commit

 

History

History
29 lines (20 loc) · 1.29 KB

README.md

File metadata and controls

29 lines (20 loc) · 1.29 KB

krust

krust is a k-mer counter--a bioinformatics 101 tool for counting the frequency of substrings of length k within strings of DNA data. It's written in Rust and run from the command line. It takes a fasta file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has a reverse complement) and their frequency across all records in the given fasta file.

Run krust on the test data* in the krust Github repo, searching for kmers of length 5, like this:

    cargo run --release 5 cerevisae.pan.fa > output.tsv

or, searching for kmers of length 21:

    cargo run --release 21 cerevisae.pan.fa > output.tsv

krust prints to stdout, writing, on alternate lines:

    >{frequency}  
    {canonical k-mer}
    >{frequency}  
    {canonical k-mer}  
    ...

krust uses the rust-bio, rayon, and dashmap Rust libraries.

*Unusual, yes, to provide this data in the repo, but it's helped me spread word about what I'm doing.