Skip to content

Commit

Permalink
docs and readme sync
Browse files Browse the repository at this point in the history
  • Loading branch information
“suchapalaver” committed Oct 14, 2021
1 parent 087e2f3 commit 4975135
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 29 deletions.
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,16 @@
Run krust on the test data, searching for kmers of length 5 across all sequences, like this:
`krust` is a [k-mer](https://en.wikipedia.org/wiki/K-mer) counter written in Rust and run from the command line that will output canonical k-mers and their frequency across the records in a fasta file.

$ cargo run --release 5 cerevisae.pan.fa > output.tsv
`krust` prints to `stdout`, writing, on alternate lines:
```>{frequency}```
```{canonical k-mer}```

or, searching for kmers of length 21:
`krust` uses [`rust-bio`](https://docs.rs/bio/0.38.0/bio/), [`rayon`](https://docs.rs/rayon/1.5.1/rayon/), and [`dashmap`](https://docs.rs/crate/dashmap/4.0.2).

$ cargo run --release 21 cerevisae.pan.fa > output.tsv
Run `krust` on the test data in the [`krust` Github repo](https://github.com/suchapalaver/krust), searching for kmers of length 5, like this:
```$ cargo run --release 5 cerevisae.pan.fa > output.tsv```
or, searching for kmers of length 21:
```$ cargo run --release 21 cerevisae.pan.fa > output.tsv```

Future:
A function like fn single_sequence_canonical_kmers(filepath: String, k: usize) {}
Would returns k-mer counts for individual sequences in a fasta file.
44 changes: 19 additions & 25 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,28 +1,21 @@
//! # Krust
//! # krust
//!
//! Krust is a k-mer counter written in Rust and run from the command line that will output canonical k-mers and their frequency across the records in a fasta file.
//! `krust` is a [k-mer](https://en.wikipedia.org/wiki/K-mer) counter written in Rust and run from the command line that will output canonical k-mers and their frequency across the records in a fasta file.
//!
//! Krust prints to `stdout`, writing, on alternate lines, for example, to a .tsv file:
//! `krust` prints to `stdout`, writing, on alternate lines:
//! ```>{frequency}```
//! ```{canonical k-mer}```
//!
//! `>{frequency across fasta file for both canonical k-mer and its reverse complement}`
//! `krust` uses [`rust-bio`](https://docs.rs/bio/0.38.0/bio/), [`rayon`](https://docs.rs/rayon/1.5.1/rayon/), and [`dashmap`](https://docs.rs/crate/dashmap/4.0.2).
//!
//! `{canonical k-mer}`
//! Run `krust` on the test data in the [`krust` Github repo](https://github.com/suchapalaver/krust), searching for kmers of length 5, like this:
//! ```$ cargo run --release 5 cerevisae.pan.fa > output.tsv```
//! or, searching for kmers of length 21:
//! ```$ cargo run --release 21 cerevisae.pan.fa > output.tsv```
//!
//! `krust` uses the [`rust-bio`](https://docs.rs/bio/0.38.0/bio/), [`rayon`](https://docs.rs/rayon/1.5.1/rayon/), and [`dashmap`](https://docs.rs/dashmap/4.0.2/dashmap/struct.DashMap.html) crates.
//!
//! Run krust on the test data in the `krust` [Github repo](https://github.com/suchapalaver/krust), searching for kmers of length 5, like this:
//!
//! ```$ cargo run --release 5 cerevisae.pan.fa > output.tsv```
//!
//! or, searching for kmers of length 21:
//!
//! ```$ cargo run --release 21 cerevisae.pan.fa > output.tsv```
//!
//! Future:
//!
//! fn single_sequence_canonical_kmers(filepath: String, k: usize) {}
//!
//! Returns k-mer counts for individual sequences in a fasta file
//! Future:
//! A function like fn single_sequence_canonical_kmers(filepath: String, k: usize) {}
//! Would returns k-mer counts for individual sequences in a fasta file.
use bio::{alphabets::dna::revcomp, io::fasta};
use dashmap::DashMap;
Expand All @@ -48,11 +41,12 @@ impl Config {
}
}

/// Reads sequences from fasta records in parallel using `rayon` (crate).
/// Ignores substrings containing `N`.
/// Canonicalizes by lexicographically smaller of k-mer/reverse-complement
/// Returns a `DashMap` of canonical k-mers (keys) and their frequency in the data (values).
pub fn canonicalize_kmers(
/// Reads sequences from fasta records in parallel using [`rayon`](https://docs.rs/rayon/1.5.1/rayon/).
/// Using [`Dashmap`](https://docs.rs/dashmap/4.0.2/dashmap/struct.DashMap.html) allows updating single hashmap in parallel.
/// Ignores substrings containing `N`.
/// Canonicalizes by lexicographically smaller of k-mer/reverse-complement.
/// Returns a hashmap of canonical k-mers (keys) and their frequency in the data (values).
pub fn canonicalize_kmer(
filepath: String,
k: usize,
) -> Result<DashMap<Box<[u8]>, u64>, &'static str> {
Expand Down

0 comments on commit 4975135

Please sign in to comment.