Releases · pangenome/pggb

09 Jul 08:25

AndreaGuarracino

v0.4.0

99b77cc

pggb 0.4.0 - Pasticcino

This introduces:

temporary directory management (#197);
improvements of the alignments (#195, #198);
samtools, fastix, igraph, pycairo, and pafplot in the docker/singularity image (#204);
better management of the -r/--resume flag (#205);
great improvement in graph normalization by replacing abPOA with SPOA, running it in local mode (#207). This resolves the SNV arrays introduced during the smoothing;
abPOA/SPOA selection (#209);
variant decomposition with vcfbub and vcfwave (#211)

Assets 2

20 Apr 13:58

AndreaGuarracino

v0.3.1

b521a84

pggb 0.3.1 - Pasticcione

This:

updates in how tools are compiled/built to ensure greater inter-system compatibility;
handles -n differences between pggb and wfmash;
outputs only one final graph with final suffix.

Assets 2

28 Mar 14:57

AndreaGuarracino

v0.3.0

0897ff9

pggb 0.3.0 - Esplorazione

This introduces major changes in pggb's' interface and alignment step for supporting high-divergence genomes and compressed graph representations:

new wfmash version for aligning highly divergent sequences;
new default values;
simpler interface, with a few changes:
- drop -U/--normalize, -I/--block-id-min, and -M/--no-merge-segments
- merge -v and -L in -v/--skip-viz )
- replace -F with -M for requesting the MAF output;
make 1D and 2D visualizations by default (-v/--skip-viz for disabling this);
mandatory normalization with GFAFfix;
do not keep intermediate files by default (-A to keep them);
updated seqwish, smoothxg, odgi, gffafix, vg;
new (still WIP) documentation at https://pggb.readthedocs.io/, with tutorials for PanSN-spec naming and sequence clustering;
addded bcftools statistics in the MultiQC report;
added PAF file format input (-a/--input-paf) to skip the alignment step;
emit 2D graph layouts in TSV too for visualizing them with gfaestus;
timer for gffafix execution;
use bgzip for compressing VCF files;

Assets 2

03 Nov 20:46

AndreaGuarracino

v0.2.0

531f85f

pggb 0.2.0 - Pushing

This introduces a series of bug fixings and changes in how several steps are performed:

alignment (wfmash):
- the memory consumption during the alignment has drastically reduced
- fixed memory leaks
- ignore uncalled bases (Ns) during the approximate mapping
- improved alignment of sequences shorter than 50kbps
- improved the resolution of the alignment boundaries
normalization (smoothxg):
- Use unseeded abPOA mode. This requires more memory than the seeded mode, but avoid alignment failure in repetitive and low-complexity regions
- The POA blocks are padded more, to better resolve their boundaries, but not too much in too-deep blocks (depth > 100 times the number of haplotypes in input) to avoid computational burden in big high-repetitive regions (human chr16)
Improved documentation

Assets 2

13 Aug 07:32

ekg

v0.1.3

911a28c

super fast, super good

Use seeded abPOA mode (in smoothxg). This requires much less memory than the unseeded mode, and allows us to explore much larger POA target lengths (-G).
Set a lower POA overlap by default (-O 0.001)
Automatically compute smoothxg -w as -G * -n. A specific number of haplotypes can be given via -H, should this differ from -n, in which case the window size is set as -G * -H.

Suggested use: set pggb -k larger than SINE elements or other short repeats in the genome. For human, we use -k 311. A long segment length for mapping is also recommended (in human we use -s 100000). These aren't in the defaults yet, but subsequent releases will have preset settings for different genome lengths and divergences.

Assets 2

02 Aug 16:45

ekg

v0.1.2

5fdb102

integrate gfaffix and pad the POA problems

Correct VCF output.
Clean up the output graph with GFAffix.
Pad the POA problems to localize them slightly. This is configurable, but set to 1% of the longest problem length.

Assets 2

16 Jul 17:25

ekg

v0.1.1

7251010

improving output quality

This checkpoints ongoing work to improve the variant description accuracy and parsimony of the output graphs.

Assets 2

03 May 15:40

ekg

v0.1.0

4eb6ba9

a reliable pangenome graph builder

pggb's development has taken place over much of the last year. It's seen a number of twists and changes to the components of the process and the best parameters for typical problems. Now, as its components are reaching a high level of refinement, it's time to mark a release.

There is nothing special about this particular checkpoint. In the future, we're sure to do better. But, at this point, for the problems we're giving pggb, it's giving us clean, reasonable solutions that reflect simple models of the underlying variation in the sequences we're aligning.

In particular, the quality of the process has been driven by major improvements to the initial mapping phase. These improvements were the fruit of a shakedown and reformulation of core features of the wfmash ultralong sequence aligner. Thanks to @AndreaGuarracino and @urbanslug for their work on this.

Now, wfmash's precise, global alignments of whole chromosomes support the generation of clean, comprehensive pangenome models based on the direct relation of all sequences in our input.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: pangenome/pggb

pggb 0.4.0 - Pasticcino

pggb 0.3.1 - Pasticcione

pggb 0.3.0 - Esplorazione

pggb 0.2.0 - Pushing

super fast, super good

integrate gfaffix and pad the POA problems

improving output quality

a reliable pangenome graph builder