Skip to content

Commit

Permalink
revise README
Browse files Browse the repository at this point in the history
  • Loading branch information
cjain7 committed Aug 30, 2022
1 parent 52b61b6 commit b3b0259
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 6 deletions.
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ without a copyright notice. Restrictions cannot be placed on its present or
future use.

--
For minigraph:
For code used from minigraph:

URL: https://lh3.github.io/minigraph

Expand Down
8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ cd minichain && make

## <a name="intro"></a>Introduction

Minichain is designed to align long reads to pangenome graphs represented as DAGs. It can scale to pangenomes built from several human genome assemblies. We have incorporated a provably-good gap-sensitive co-linear chaining algorithm for filtering anchors (see [paper](#pub) for details). This algorithm enables accurate and fast long read alignments. Minichain uses seeding and base-to-base alignment code from [minigraph][minigraph].
Minichain is designed to align long reads to pangenome graphs represented as DAGs. It can scale to pangenomes built from several human genome assemblies. We have designed and implemented a new provably-good gap-sensitive co-linear chaining algorithm for filtering anchors (see [paper](#pub) for details). This algorithm enables accurate and fast long read alignments. Minichain uses seeding and base-to-base alignment code from [minigraph][minigraph].

## <a name="uguide"></a>User's Guide

Expand All @@ -32,7 +32,7 @@ Minichain is designed to align long reads to pangenome graphs represented as DAG


### <a name="map"></a>Sequence mapping
Minichain can be used for both sequence-to-sequence alignment as well as sequence-to-graph alignment. A graph should be provided in either [GFA][gfa1] or [rGFA][rgfa]format. Users can run quick tests on [sample data](data/) using the following commands. The alignment output is provided in [PAF](https://github.com/lh3/miniasm/blob/master/PAF.md) and [GAF](https://github.com/lh3/gfatools/blob/master/doc/rGFA.md#the-graph-alignment-format-gaf) format respectively.
Minichain can be used for both sequence-to-sequence alignment as well as sequence-to-graph alignment. A graph should be provided in either [GFA][gfa1] or [rGFA][rgfa]format. Users can run quick tests on [sample data](data/) using the following commands. The alignment output is provided in [PAF](https://github.com/lh3/miniasm/blob/master/PAF.md) or [GAF](https://github.com/lh3/gfatools/blob/master/doc/rGFA.md#the-graph-alignment-format-gaf) format respectively.
```sh
# Map sequence to sequence
./minichain -cx lr test/MT-human.fa test/MT-orangA.fa > out.paf
Expand All @@ -41,7 +41,7 @@ Minichain can be used for both sequence-to-sequence alignment as well as sequenc
```

## <a name="bench"></a>Benchmark
We have compared Minichain (v1.0) with existing sequence to graph aligners to demonstrate scalability and accuracy gains. Our experiments used human pangenome graphs built by using subsets of [94 high quality haplotype assemblies](https://github.com/human-pangenomics/HPP_Year1_Assemblies) provided by the Human Pangenome Reference Consortium, and [CHM13 human genome assembly](https://www.ncbi.nlm.nih.gov/assembly/GCA_009914755.4) provided by the Telomere-to-Telomere consortium. Using a simulated long read dataset with 0.5x coverage, and graphs of three different sizes, Minichain shows superior read mapping precision. For the largest DAG from 95 haplotypes, Minichain used 24 minutes and 25 GB RAM with 32 threads.
We have compared Minichain (v1.0) with existing sequence to graph aligners to demonstrate scalability and accuracy gains. Our experiments used human pangenome graphs built by using subsets of [94 high quality haplotype assemblies](https://github.com/human-pangenomics/HPP_Year1_Assemblies) provided by the Human Pangenome Reference Consortium, and [CHM13 human genome assembly](https://www.ncbi.nlm.nih.gov/assembly/GCA_009914755.4) provided by the Telomere-to-Telomere consortium. Using a simulated long read dataset with 0.5x coverage, and graphs of three different sizes, we see superior read mapping precision. For the largest DAG from 95 haplotypes, Minichain used 24 minutes and 25 GB RAM with 32 threads.

<img src="./data/plot.png" width="700">

Expand All @@ -56,8 +56,6 @@ We plan to continue adding features in future releases.

- **Ghanshyam Chandra and Chirag Jain**. "[Sequence to graph alignment using gap-sensitive co-linear chaining](https://doi.org)". *BioRxiv*, 2022.

[gwfa]: https://arxiv.org/abs/2206.13574
[paper]: https://doi.org/10.1186/s13059-020-02168-z
[minigraph]: https://github.com/lh3/minigraph
[zlib]: http://zlib.net/
[gcc9]: https://gcc.gnu.org/
Expand Down

0 comments on commit b3b0259

Please sign in to comment.