Skip to content

Releases: pangenome/pggb

pggb 0.4.0 - Pasticcino

09 Jul 08:25
99b77cc
Compare
Choose a tag to compare

This introduces:

  • temporary directory management (#197);
  • improvements of the alignments (#195, #198);
  • samtools, fastix, igraph, pycairo, and pafplot in the docker/singularity image (#204);
  • better management of the -r/--resume flag (#205);
  • great improvement in graph normalization by replacing abPOA with SPOA, running it in local mode (#207). This resolves the SNV arrays introduced during the smoothing;
  • abPOA/SPOA selection (#209);
  • variant decomposition with vcfbub and vcfwave (#211)

pggb 0.3.1 - Pasticcione

20 Apr 13:58
b521a84
Compare
Choose a tag to compare

This:

  • updates in how tools are compiled/built to ensure greater inter-system compatibility;
  • handles -n differences between pggb and wfmash;
  • outputs only one final graph with final suffix.

pggb 0.3.0 - Esplorazione

28 Mar 14:57
0897ff9
Compare
Choose a tag to compare

This introduces major changes in pggb's' interface and alignment step for supporting high-divergence genomes and compressed graph representations:

  • new wfmash version for aligning highly divergent sequences;
  • new default values;
  • simpler interface, with a few changes:
    • drop -U/--normalize, -I/--block-id-min, and -M/--no-merge-segments
    • merge -v and -L in -v/--skip-viz )
    • replace -F with -M for requesting the MAF output;
  • make 1D and 2D visualizations by default (-v/--skip-viz for disabling this);
  • mandatory normalization with GFAFfix;
  • do not keep intermediate files by default (-A to keep them);
  • updated seqwish, smoothxg, odgi, gffafix, vg;
  • new (still WIP) documentation at https://pggb.readthedocs.io/, with tutorials for PanSN-spec naming and sequence clustering;
  • addded bcftools statistics in the MultiQC report;
  • added PAF file format input (-a/--input-paf) to skip the alignment step;
  • emit 2D graph layouts in TSV too for visualizing them with gfaestus;
  • timer for gffafix execution;
  • use bgzip for compressing VCF files;

pggb 0.2.0 - Pushing

03 Nov 20:46
Compare
Choose a tag to compare

This introduces a series of bug fixings and changes in how several steps are performed:

  • alignment (wfmash):

    • the memory consumption during the alignment has drastically reduced
    • fixed memory leaks
    • ignore uncalled bases (Ns) during the approximate mapping
    • improved alignment of sequences shorter than 50kbps
    • improved the resolution of the alignment boundaries
  • normalization (smoothxg):

    • Use unseeded abPOA mode. This requires more memory than the seeded mode, but avoid alignment failure in repetitive and low-complexity regions
    • The POA blocks are padded more, to better resolve their boundaries, but not too much in too-deep blocks (depth > 100 times the number of haplotypes in input) to avoid computational burden in big high-repetitive regions (human chr16)
  • Improved documentation

super fast, super good

13 Aug 07:32
@ekg ekg
911a28c
Compare
Choose a tag to compare
  • Use seeded abPOA mode (in smoothxg). This requires much less memory than the unseeded mode, and allows us to explore much larger POA target lengths (-G).
  • Set a lower POA overlap by default (-O 0.001)
  • Automatically compute smoothxg -w as -G * -n. A specific number of haplotypes can be given via -H, should this differ from -n, in which case the window size is set as -G * -H.

Suggested use: set pggb -k larger than SINE elements or other short repeats in the genome. For human, we use -k 311. A long segment length for mapping is also recommended (in human we use -s 100000). These aren't in the defaults yet, but subsequent releases will have preset settings for different genome lengths and divergences.

integrate gfaffix and pad the POA problems

02 Aug 16:45
@ekg ekg
5fdb102
Compare
Choose a tag to compare
  • Correct VCF output.
  • Clean up the output graph with GFAffix.
  • Pad the POA problems to localize them slightly. This is configurable, but set to 1% of the longest problem length.

improving output quality

16 Jul 17:25
@ekg ekg
7251010
Compare
Choose a tag to compare

This checkpoints ongoing work to improve the variant description accuracy and parsimony of the output graphs.

a reliable pangenome graph builder

03 May 15:40
@ekg ekg
4eb6ba9
Compare
Choose a tag to compare

pggb's development has taken place over much of the last year. It's seen a number of twists and changes to the components of the process and the best parameters for typical problems. Now, as its components are reaching a high level of refinement, it's time to mark a release.

There is nothing special about this particular checkpoint. In the future, we're sure to do better. But, at this point, for the problems we're giving pggb, it's giving us clean, reasonable solutions that reflect simple models of the underlying variation in the sequences we're aligning.

In particular, the quality of the process has been driven by major improvements to the initial mapping phase. These improvements were the fruit of a shakedown and reformulation of core features of the wfmash ultralong sequence aligner. Thanks to @AndreaGuarracino and @urbanslug for their work on this.

Now, wfmash's precise, global alignments of whole chromosomes support the generation of clean, comprehensive pangenome models based on the direct relation of all sequences in our input.