Skip to content

Commit

Permalink
use types to streamline kmer parsing
Browse files Browse the repository at this point in the history
  • Loading branch information
suchapalaver committed Dec 4, 2022
1 parent 3573336 commit e78b3f5
Show file tree
Hide file tree
Showing 13 changed files with 292 additions and 222,192 deletions.
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@

`krust` is a [k-mer](https://en.wikipedia.org/wiki/K-mer) counter--a bioinformatics 101 tool for counting the frequency of substrings of length `k` within strings of DNA data. It's written in Rust and run from the command line. It takes a fasta file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has a [reverse complement](https://en.wikipedia.org/wiki/Complementarity_(molecular_biology)#DNA_and_RNA_base_pair_complementarity)) and their frequency across all records in the given fasta file.

Run `krust` on the test data* in the [`krust` Github repo](https://github.com/suchapalaver/krust), searching for kmers of length 5, like this:
Run `krust` to count *5*-mers like this:

```bash
cargo run --release 5 your/local/path/to/cerevisae.pan.fa > output.tsv
cargo run --release 5 your/local/path/to/fasta_data.fa > output.tsv
```

or, searching for kmers of length 21:
or, searching for *21*-mers:

```bash
cargo run --release 21 your/local/path/to/cerevisae.pan.fa > output.tsv
cargo run --release 21 your/local/path/to/fasta_data.fa > output.tsv
```

`krust` prints to `stdout`, writing, on alternate lines:
Expand All @@ -24,6 +24,4 @@ cargo run --release 21 your/local/path/to/cerevisae.pan.fa > output.tsv
...
```

`krust` uses the [`rust-bio`](https://docs.rs/bio/0.38.0/bio/), [`rayon`](https://docs.rs/rayon/1.5.1/rayon/), and [`dashmap`](https://docs.rs/crate/dashmap/4.0.2) Rust libraries.

*Unusual, yes, to provide this data in the repo, but it's helped me spread word about what I'm doing.
`krust` uses [`rust-bio`](https://docs.rs/bio/0.38.0/bio/), [`rayon`](https://docs.rs/rayon/1.5.1/rayon/), and [`dashmap`](https://docs.rs/crate/dashmap/4.0.2).
221,848 changes: 0 additions & 221,848 deletions cerevisiae.pan.fa

This file was deleted.

50 changes: 0 additions & 50 deletions src/bitpacked_kmer.rs

This file was deleted.

2 changes: 1 addition & 1 deletion src/configuration.rs → src/config.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use std::{env, error::Error, path::PathBuf};

/// Parsing command line k-size and filepath arguments.
/// Parsing command line k-size and filepath arguments
pub struct Config {
pub k: usize,
pub path: PathBuf,
Expand Down
14 changes: 0 additions & 14 deletions src/dashmaps.rs

This file was deleted.

Loading

0 comments on commit e78b3f5

Please sign in to comment.