-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d4520a1
commit 5950c2a
Showing
7 changed files
with
222 additions
and
41 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,42 @@ | ||
# `krust` | ||
|
||
`krust` is a [k-mer](https://en.wikipedia.org/wiki/K-mer) counter--a bioinformatics 101 tool for counting the frequency of substrings of length `k` within strings of DNA data. It's written in Rust and run from the command line. It takes a fasta file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has a [reverse complement](https://en.wikipedia.org/wiki/Complementarity_(molecular_biology)#DNA_and_RNA_base_pair_complementarity)) and their frequency across all records in the given fasta file. | ||
## Counts k-mers, written in rust | ||
|
||
`krust` supports either `rust-bio`, by default, or `needletail`, with **any** additional command line argument, for FASTA reading. | ||
```bash | ||
Usage: krust <k> <path> [reader] | ||
|
||
Arguments: | ||
<k> provides k length, e.g. 5 | ||
<path> path to a FASTA file, e.g. /home/lisa/bio/cerevisiae.pan.fa | ||
[reader] select *rust-bio* or *needletail* as FASTA reader [default: rust-bio] | ||
|
||
Options: | ||
-h, --help Print help information | ||
-V, --version Print version information | ||
``` | ||
`krust` is a [k-mer](https://en.wikipedia.org/wiki/K-mer) counter - a bioinformatics 101 tool for counting the frequency of substrings of length `k` within strings of DNA data. `krust` is written in Rust and run from the command line. It takes a fasta file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has a [reverse complement](https://en.wikipedia.org/wiki/Complementarity_(molecular_biology)#DNA_and_RNA_base_pair_complementarity)) and their frequency across all records in the given data. `krust` is tested for accuracy against [jellyfish](https://github.com/gmarcais/Jellyfish). | ||
`krust` supports either `rust-bio` or `needletail` to read fasta records. | ||
Run `krust` with `rust-bio`'s FASTA reader to count *5*-mers like this: | ||
Run `krust` with `rust-bio`'s fasta reader to count *5*-mers like this: | ||
```bash | ||
cargo run --release 5 your/local/path/to/fasta_data.fa > output.tsv | ||
cargo run --release 5 your/local/path/to/fasta_data.fa | ||
``` | ||
or, searching for *21*-mers with `needletail` as the FASTA reader like this: | ||
or, searching for *21*-mers with `needletail` as the fasta reader like this: | ||
```bash | ||
cargo run --release 21 your/local/path/to/fasta_data.fa . > output.tsv | ||
cargo run --release 21 your/local/path/to/fasta_data.fa needletail | ||
``` | ||
`krust` prints to `stdout`, writing, on alternate lines: | ||
```bash | ||
>{frequency} | ||
{canonical k-mer} | ||
>{frequency} | ||
{canonical k-mer} | ||
>114928 | ||
ATGCC | ||
>289495 | ||
AATCA | ||
... | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,49 @@ | ||
use std::{env, process}; | ||
use std::process; | ||
|
||
use krust::{config::Config, startup}; | ||
|
||
use clap::{Arg, Command}; | ||
|
||
fn main() { | ||
let config = Config::new(env::args()).unwrap_or_else(|err| { | ||
eprintln!("Problem parsing arguments: {}", err); | ||
let matches = Command::new("krust") | ||
.version("1.0") | ||
.author("Joseph L. <[email protected]>") | ||
.about("krust: counts k-mers, written in rust") | ||
.arg( | ||
Arg::new("k") | ||
.help("provides k length, e.g. 5") | ||
.required(true), | ||
) | ||
.arg( | ||
Arg::new("path") | ||
.help("path to a FASTA file, e.g. /home/lisa/bio/cerevisiae.pan.fa") | ||
.required(true), | ||
) | ||
.arg( | ||
Arg::new("reader") | ||
.help("select *rust-bio* or *needletail* as FASTA reader") | ||
.required(false) | ||
.default_value("rust-bio"), | ||
) | ||
.get_matches(); | ||
|
||
let k = matches.get_one::<String>("k").expect("required"); | ||
let path = matches.get_one::<String>("path").expect("required"); | ||
let reader = matches.get_one::<String>("reader").unwrap(); | ||
|
||
println!(); | ||
|
||
let config = Config::new(k, path, reader).unwrap_or_else(|e| { | ||
eprintln!("Problem parsing arguments: {}", e); | ||
eprintln!("\nFor help menu:\n\n cargo run -- --help\nor:\n krust --help\n"); | ||
process::exit(1); | ||
}); | ||
|
||
println!("counting {}-mers", k); | ||
println!("in {}", path); | ||
println!("using {} reader", reader); | ||
println!(); | ||
|
||
if let Err(e) = startup::run(config.path, config.k, config.reader) { | ||
eprintln!("Application error: {}", e); | ||
drop(e); | ||
|
Oops, something went wrong.