The python 3 Version of DING. This is the version where new features will still be added and refined. As of now, this version is still to be regarded as experimental as it has not been tested thoroughly. Feel free to report any bugs or ideas for new features!
An overview of the scripts dealing directly with the DING ILP:
script | purpose | input | output | dependencies |
---|---|---|---|---|
dingII.py | Generate the DING ILP or create a custom model file in order to fine tune how many genes per family are to be matched | UniMoG file containing a single genome pair, (custom model file) | gurobi lp file or custom model file | python3, networkx, dingII_util, ilp_util_adj |
dingII_parsesol.py | Calculate the distance and matching | UniMoG file with the original, unmatched genomes, gurobi solution file containing a single solution with objective value | UniMoG file with relabeled genes according to matching | python3, networkx, dingII_util, ilp_util_adj |
dingII_util.py | Provide utility functions for MRD generation and solution parsing | - | - | python3,networkx, ilp_util_adj |
ilp_util_adj.py | Provide further utility functions | - | - | python3 |
A Typical workflow using the maximal matching model would look like this:
- Generate the ILP:
./dingII.py {unimog-file} -mm --writeilp {ilp-file}
- Use a solver to obtain a gurobi solution
{gurobi-sol}
. - Get the matching and distance (and number of indels as well as a summary of runs ("indel-blocks")):
./dingII_parsesol.py {unimog-file} --solgur {gurobi-sol} --matching {unimog-matching}
(--numindels --runs {run-file}
)
where {unimog-file}
is the original unmatched genome pair in UniMoG-Format and {gurobi-sol}
the gurobi solution file with objective value (e.g. # Objective value = 12
).
Run file format
Run output file format ({run-file}
):
- Each line encompasses all runs within the same cycle of the decomposition
- Runs within a cycle are separated by TAB-characters
- If there is no A-run the line begins with a TAB
- Runs are the concatenated string of the oriented markers to be deleted
More formally:
{cycle-1}
{cycle-2}
...
with {cycle-n}
= {A-run}\tab{B-run}...
or cycle-n
= \tab {B-run}
and {X-run}
=(+/-)indel1(+/-)indel2...
The work of this repository is described in the following paper:
- Bohnenkämper L., Braga M.D.V., Doerr D., Stoye J.: Computing the Rearrangement Distance of Natural Genomes. In: Schwartz R. (eds) Proc. of RECOMB 2020. LNCS 12074, 3-18. Springer Verlag, 2020. DOI
For a more detailed description see the following arxiv preprint:
- Bohnenkämper, L., Braga, M.D.V., Doerr, D., Stoye, J.: Computing the rearrangement distance of natural genomes. arXiv:2001.02139 (2020)