Skip to content

Commit

Permalink
feat: add cli
Browse files Browse the repository at this point in the history
  • Loading branch information
suchapalaver committed Dec 29, 2022
1 parent d4520a1 commit 5950c2a
Show file tree
Hide file tree
Showing 7 changed files with 222 additions and 41 deletions.
138 changes: 138 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ edition = "2021"
[dependencies]
bio = "*"
bytes = "1.3.0"
clap = "4.0.32"
custom_error = "1.9.2"
dashmap = "5.4.0"
fxhash = "0.2.1"
Expand Down
35 changes: 25 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,42 @@
# `krust`

`krust` is a [k-mer](https://en.wikipedia.org/wiki/K-mer) counter--a bioinformatics 101 tool for counting the frequency of substrings of length `k` within strings of DNA data. It's written in Rust and run from the command line. It takes a fasta file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has a [reverse complement](https://en.wikipedia.org/wiki/Complementarity_(molecular_biology)#DNA_and_RNA_base_pair_complementarity)) and their frequency across all records in the given fasta file.
## Counts k-mers, written in rust

`krust` supports either `rust-bio`, by default, or `needletail`, with **any** additional command line argument, for FASTA reading.
```bash
Usage: krust <k> <path> [reader]

Arguments:
<k> provides k length, e.g. 5
<path> path to a FASTA file, e.g. /home/lisa/bio/cerevisiae.pan.fa
[reader] select *rust-bio* or *needletail* as FASTA reader [default: rust-bio]

Options:
-h, --help Print help information
-V, --version Print version information
```
`krust` is a [k-mer](https://en.wikipedia.org/wiki/K-mer) counter - a bioinformatics 101 tool for counting the frequency of substrings of length `k` within strings of DNA data. `krust` is written in Rust and run from the command line. It takes a fasta file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has a [reverse complement](https://en.wikipedia.org/wiki/Complementarity_(molecular_biology)#DNA_and_RNA_base_pair_complementarity)) and their frequency across all records in the given data. `krust` is tested for accuracy against [jellyfish](https://github.com/gmarcais/Jellyfish).
`krust` supports either `rust-bio` or `needletail` to read fasta records.
Run `krust` with `rust-bio`'s FASTA reader to count *5*-mers like this:
Run `krust` with `rust-bio`'s fasta reader to count *5*-mers like this:
```bash
cargo run --release 5 your/local/path/to/fasta_data.fa > output.tsv
cargo run --release 5 your/local/path/to/fasta_data.fa
```
or, searching for *21*-mers with `needletail` as the FASTA reader like this:
or, searching for *21*-mers with `needletail` as the fasta reader like this:
```bash
cargo run --release 21 your/local/path/to/fasta_data.fa . > output.tsv
cargo run --release 21 your/local/path/to/fasta_data.fa needletail
```
`krust` prints to `stdout`, writing, on alternate lines:
```bash
>{frequency}
{canonical k-mer}
>{frequency}
{canonical k-mer}
>114928
ATGCC
>289495
AATCA
...
```
28 changes: 14 additions & 14 deletions src/config.rs
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
use std::{env, error::Error, path::PathBuf};
use std::{error::Error, path::PathBuf, fs};

/// Parsing command line k-size and filepath arguments
pub struct Config {
pub k: usize,
pub path: PathBuf,
pub reader: bool,
}

impl Config {
pub fn new(mut args: env::Args) -> Result<Config, Box<dyn Error>> {
let k: usize = match args.nth(1) {
Some(arg) => match arg.parse() {
Ok(k) if k > 0 && k < 33 => k,
Ok(_) => return Err("k-mer length needs to be larger than zero and, for `krust` in its current working form, no more than 32".into()),
Err(_) => return Err(format!("issue with k-mer length argument: {}", arg).into()),
},
None => return Err("k-mer length input required".into()),
pub fn new(k: &str, path: &str, reader: &str) -> Result<Config, Box<dyn Error>> {
let k: usize = match k.parse::<usize>() {
Ok(k) if k > 0 && k < 33 => k,
Ok(_) => return Err("k-mer length needs to be larger than zero and, for krust currently, no more than 32".into()),
Err(_) => return Err(format!("Issue with k-mer length argument \"{}\"", k).into()),
};

let path = match args.next() {
Some(arg) => arg.into(),
None => return Err("filepath argument needed".into()),
let path = match fs::metadata(path) {
Ok(_) => path.into(),
Err(e) => return Err(format!("Issue with file path: {}", e).into()),
};

let reader = args.next().is_some();
let reader = match reader {
reader if matches!(reader, "needletail") => true,
reader if matches!(reader, "rust-bio") => true,
_ => return Err(format!("Invalid reader argument: \"{}\"", reader).into()),
};

Ok(Config { k, path, reader })
}
Expand Down
7 changes: 1 addition & 6 deletions src/kmer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,11 @@ use std::cmp::Ordering;

use bytes::Bytes;

custom_error::custom_error! { pub ValidityError
InvalidByte = "not a valid byte",
}

/// Compressing k-mers of length `0 < k < 33`, Kmering them into unsigned integers
#[derive(Debug, Default, Eq, PartialEq, Hash)]
pub(crate) struct Kmer {
pub(crate) bytes: Bytes,
pub(crate) reverse_complement: bool,
pub(crate) packed_bits: u64,
pub(crate) reverse_complement: bool,
pub(crate) count: i32,
}

Expand Down
42 changes: 39 additions & 3 deletions src/main.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,49 @@
use std::{env, process};
use std::process;

use krust::{config::Config, startup};

use clap::{Arg, Command};

fn main() {
let config = Config::new(env::args()).unwrap_or_else(|err| {
eprintln!("Problem parsing arguments: {}", err);
let matches = Command::new("krust")
.version("1.0")
.author("Joseph L. <[email protected]>")
.about("krust: counts k-mers, written in rust")
.arg(
Arg::new("k")
.help("provides k length, e.g. 5")
.required(true),
)
.arg(
Arg::new("path")
.help("path to a FASTA file, e.g. /home/lisa/bio/cerevisiae.pan.fa")
.required(true),
)
.arg(
Arg::new("reader")
.help("select *rust-bio* or *needletail* as FASTA reader")
.required(false)
.default_value("rust-bio"),
)
.get_matches();

let k = matches.get_one::<String>("k").expect("required");
let path = matches.get_one::<String>("path").expect("required");
let reader = matches.get_one::<String>("reader").unwrap();

println!();

let config = Config::new(k, path, reader).unwrap_or_else(|e| {
eprintln!("Problem parsing arguments: {}", e);
eprintln!("\nFor help menu:\n\n cargo run -- --help\nor:\n krust --help\n");
process::exit(1);
});

println!("counting {}-mers", k);
println!("in {}", path);
println!("using {} reader", reader);
println!();

if let Err(e) = startup::run(config.path, config.k, config.reader) {
eprintln!("Application error: {}", e);
drop(e);
Expand Down
Loading

0 comments on commit 5950c2a

Please sign in to comment.