indifferent
analyzes two strings, computes the difference between the two, and prints the results in a variety of formats. It is indifferent to formatting and separators, focusing on the actual content of the strings.
It can produce results in a variety of forms, from raw unprocessed results to formatted HTML.
Differences are calculated without getting too clever. indifferent
splits the "base" and "revision" string into words and separators, and then walks through the "base" looking for matches in "revision". Once a match is found, it backfills the preceding unmatched "base" and "revision" words and separators, and then keeps looking for the next match.
The ordering of words matters, so these strings would match on A
, tabby
, and cat
:
base = "A tabby cat"
revision = "A big orange tabby cat"
These would only only match on A
and cat
:
base = "A tabby cat"
revision = "A big cat that is an orange tabby"
Install indifferent
from PyPI:
python -m pip install indifferent
Use the compare
function to generate differences:
from indifferent import compare
result = compare(
base="A tabby cat",
revision="A big orange tabby cat",
base_name="A name for the base text, displayed in output", # optional
revision_name="A name for the revision text, displayed in output", # optional
results="stats" # optional, see below for alternate output formats
)
indifferent
can provide results in a few different formats, depending upon what you want to do with them.
By default (or with the argument results="stats"
), indifferent
returns a dict with stats about the base and revision, and the results of the comparison. This is useful, for example, if you need to compare a base text against a number of reference texts and find the one that is most similar.
from indifferent import compare
indifferent.compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
)
returns:
{
"inputs": {
"base": {"length": {"content": 5, "total": 9}},
"revision": {"length": {"content": 4, "total": 7}},
},
"results": {
"added": {"length": {"content": 1, "total": 1}},
"matched": {
"base_preserved": {"content": 0.6, "total": 0.6666666666666666},
"length": {"content": 3, "total": 6},
"revision_matched": {"content": 0.75, "total": 0.8571428571428571},
},
"removed": {"length": {"content": 2, "total": 3}},
},
"score": {"content": 0.5, "total": 0.6},
}
In the default results content
refers to words and total
refers to words plus separators (whitespace, punctuation, etc.).
inputs
contains stats about the length of the two strings.results
contains stats about the comparisonadded
means words and separators that exist in the revision but not the base.matched
means words and separators in commonbase_preserved
means the percentage of words and separators from the base that matchlength
means the number of matching words and separators foundrevision_matched
means the percentage of words and separators from the revision that match
removed
means words and separators that exist in the base but not the revision.
score
contains stats about the match.
To compare the meaning of two strings, inspect ["score"]["content"]
. To deterimine whether two strings match, inspect ["score"]["total]
.
You can also get completely unanalyzed results. This would be useful if you want to handle the analysis on your own.
from indifferent import compare
compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
results="raw",
)
returns:
[{'base': 0, 'content': True, 'revision': 0, 'value': 'A'},
{'base': 1, 'content': False, 'revision': 1, 'value': ' '},
{'base': 2, 'content': True, 'revision': None, 'value': 'small'},
{'base': None, 'content': True, 'revision': 2, 'value': 'big'},
{'base': 3, 'content': False, 'revision': 3, 'value': ' '},
{'base': 4, 'content': True, 'revision': 4, 'value': 'orange'},
{'base': 5, 'content': False, 'revision': 5, 'value': ' '},
{'base': 6, 'content': True, 'revision': None, 'value': 'tabby'},
{'base': 7, 'content': False, 'revision': None, 'value': ' '},
{'base': 8, 'content': True, 'revision': 6, 'value': 'cat'}]
The result is a list of all elements of both base and revision, and is analyzed to calculate the stats.
base
is the index ofvalue
in the base textcontent
isTrue
if the item is a word,False
if the item is a separatorrevision
is the index ofvalue
in the revision textvalue
is the actual value of the item.
indifferent
can also return stats in human-readable format as label:value
pairs. This is a good option if you want to build your own reports.
from indifferent import compare
indifferent.compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
results="formatted_stats",
)
returns:
{
"base": [
{"label": "Base length", "value": "5 words and 4 separators (9 total)"},
{"label": "Words also in the revision", "value": "3 of 5 (60%)"},
{"label": "Similarity", "value": "67% identical to the revision"},
],
"matched": [
{
"label": "Identical in base and revision",
"value": "3 words and 3 separators (6 total)",
},
{
"label": "Removed from the base",
"value": "2 words and 1 separators (3 total)",
},
{
"label": "Added by the revision",
"value": "1 words and 0 separators (1 total)",
},
],
"revision": [
{"label": "Revision length", "value": "4 words and 3 separators (7 total)"},
{"label": "Words also in the base", "value": "3 of 4 (75%)"},
{"label": "Similarity", "value": "86% identical to the base"},
],
"summary": "50% match",
}
indifferent
can produce a summary in BBCode format. This is a useful, parseable intermediate state if you need to produce a more polished document.
from indifferent import compare
compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
results="bbcode",
)
returns:
{
"analysis": {
"base": "[b]Base length:[/b]\n"
"5 words and 4 separators (9 total)\n"
"\n"
"[b]Words also in the revision:[/b]\n"
"3 of 5 (60%)\n"
"\n"
"[b]Similarity:[/b]\n"
"67% identical to the revision",
"matched": "[b]Identical in base and revision:[/b]\n"
"3 words and 3 separators (6 total)\n"
"\n"
"[b]Removed from the base:[/b]\n"
"2 words and 1 separators (3 total)\n"
"\n"
"[b]Added by the revision:[/b]\n"
"1 words and 0 separators (1 total)",
"revision": "[b]Revision length:[/b]\n"
"4 words and 3 separators (7 total)\n"
"\n"
"[b]Words also in the base:[/b]\n"
"3 of 4 (75%)\n"
"\n"
"[b]Similarity:[/b]\n"
"86% identical to the base",
},
"matched": "A [s red]small[/s red][u green]big[/u green] orange [s "
"red]tabby[/s red][s red] [/s red]cat",
"summary": "50% match",
}
The result includes a formatted version of the base
, matched
, and revision
stats from result="formatted_stats"
in analysis
, and the summary
from result="formatted_stats"
.
It also produces a formatted string called matched
that marks removed items in red strikethrough, and added items in green underline.
If you are working in the terminal, indifferent
can produce nicely-formatted tables using Rich
.
from indifferent import compare
compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
results="table",
)
returns:
You can also produce an unrendered Rich
table, which allows you to do further post-processing on it.
from indifferent import compare
compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
results="raw_table",
)
returns a rich.table.Table
object.
indifferent
can produce HTML in a variety of ways. The default HTML response is dict containing an unstyled snippet of HTML to which you can apply your own styles, and the corresponding CSS which you can use... or ignore.
from indifferent import compare
compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
results="html",
)
returns:
<!-- Comparison generated by Indifferent: https://github.com/brianwarner/indifferent -->
<div class="indifferent">
<h2 class="title">Base<br /><span class="vs">vs.</span><br />Revision</h2>
<h3 class="subtitle">50% match</h3>
<div class="nav-links">
<table>
<tr>
<td><a href="#indifferent.base">Base</a></td>
<td><a href="#indifferent.revision">Revision</a></td>
<td class="last"><a href="#indifferent.match">Comparison</a></td>
</tr>
</table>
</div>
<div class="section base">
<a id="indifferent.base"></a>
<h3>Base</h3>
<div class="summary">
<ul>
<li><strong>Base length:</strong> 5 words and 4 separators (9 total)</li>
<li><strong>Words also in the revision:</strong> 3 of 5 (60%)</li>
<li><strong>Similarity:</strong> 67% identical to the revision</li>
</ul>
</div>
<div class="detail">
A small orange tabby cat
</div>
</div>
<div class="section revision">
<a id="indifferent.revision"></a>
<h3>Revision</h3>
<div class="summary">
<ul>
<li><strong>Revision length:</strong> 4 words and 3 separators (7 total)</li>
<li><strong>Words also in the base:</strong> 3 of 4 (75%)</li>
<li><strong>Similarity:</strong> 86% identical to the base</li>
</ul>
</div>
<div class="detail">
A big orange cat
</div>
</div>
<div class="section match">
<a id="indifferent.match"></a>
<h3>Comparison: 50% match</h3>
<div class="summary">
<ul>
<li><strong>Identical in base and revision:</strong> 3 words and 3 separators (6 total)</li>
<li><strong>Removed from the base:</strong> 2 words and 1 separators (3 total)</li>
<li><strong>Added by the revision:</strong> 1 words and 0 separators (1 total)</li>
</div>
<div class="detail">
A <span class="deleted">small</span><span class="added">big</span> orange <span class="deleted">tabby</span><span class="deleted"> </span>cat
</div>
</div>
</div>
You can also produce the same snippet with inline CSS. It returns a dict with the HTML and the corresponding CSS.
from indifferent import compare
compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
results="html_inline",
)
returns:
<!-- Comparison generated by Indifferent: https://github.com/brianwarner/indifferent -->
<div class="indifferent" style="max-width: 900px; min-width: 800px; margin: 0 auto; background-color: #FFF; padding: 30px 20px;">
<h2 style="text-align: center;" class="title">Base<br /><span class="vs" style="font-size: 70%; color: #333;">vs.</span><br />Revision</h2>
<h3 style="border-bottom: none; text-align: center; color: #555;" class="subtitle">50% match</h3>
<div class="nav-links" style="">
<table style="margin: 40px auto 0px;">
<tr>
<td style="border-right: 1px #888 solid;"><a href="#indifferent.base" style="color: #333; padding: 5px 10px; text-decoration: none;">Base</a></td>
<td style="border-right: 1px #888 solid;"><a href="#indifferent.revision" style="color: #333; padding: 5px 10px; text-decoration: none;">Revision</a></td>
<td style="border-right: 0px;" class="last"><a href="#indifferent.match" style="color: #333; padding: 5px 10px; text-decoration: none;">Comparison</a></td>
</tr>
</table>
</div>
<div class="section base" style="padding: 20px 0px;">
<a id="indifferent.base"></a>
<h3 style="padding-bottom: 10px; margin: 20px 0px 0px; border-bottom: 1px solid grey;">Base</h3>
<div class="summary" style="">
<ul>
<li><strong>Base length:</strong> 5 words and 4 separators (9 total)</li>
<li><strong>Words also in the revision:</strong> 3 of 5 (60%)</li>
<li><strong>Similarity:</strong> 67% identical to the revision</li>
</ul>
</div>
<div class="detail" style="margin: 10px; padding: 15px; border: 1px solid #DDD; font-family: monospace;">
A small orange tabby cat
</div>
</div>
<div class="section revision" style="padding: 20px 0px;">
<a id="indifferent.revision"></a>
<h3 style="padding-bottom: 10px; margin: 20px 0px 0px; border-bottom: 1px solid grey;">Revision</h3>
<div class="summary" style="">
<ul>
<li><strong>Revision length:</strong> 4 words and 3 separators (7 total)</li>
<li><strong>Words also in the base:</strong> 3 of 4 (75%)</li>
<li><strong>Similarity:</strong> 86% identical to the base</li>
</ul>
</div>
<div class="detail" style="margin: 10px; padding: 15px; border: 1px solid #DDD; font-family: monospace;">
A big orange cat
</div>
</div>
<div class="section match" style="padding: 20px 0px;">
<a id="indifferent.match"></a>
<h3 style="padding-bottom: 10px; margin: 20px 0px 0px; border-bottom: 1px solid grey;">Comparison: 50% match</h3>
<div class="summary" style="">
<ul>
<li><strong>Identical in base and revision:</strong> 3 words and 3 separators (6 total)</li>
<li><strong>Removed from the base:</strong> 2 words and 1 separators (3 total)</li>
<li><strong>Added by the revision:</strong> 1 words and 0 separators (1 total)</li>
</div>
<div class="detail" style="margin: 10px; padding: 15px; border: 1px solid #DDD; font-family: monospace;">
A <span class="deleted" style="color: red; text-decoration: line-through;">small</span><span class="added" style="color: green; text-decoration: underline; font-weight: bold;">big</span> orange <span class="deleted" style="color: red; text-decoration: line-through;">tabby</span><span class="deleted" style="color: red; text-decoration: line-through;"> </span>cat
</div>
</div>
</div>
indifferent
can also produce complete HTML pages. By default, it returns a dict containing the html and CSS, with a link to a filesheet named indifferent.css
. It is up to you to get them into the same directory.
from indifferent import compare
compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
results="html_page",
)
returns:
<!doctype html>
<html lang="en-US">
<head>
<meta charset="utf-8" />
<title>Comparison of Base and Revision</title>
<link rel="stylesheet" href="indifferent.css">
</head>
<body class="page">
<!-- Comparison generated by Indifferent: https://github.com/brianwarner/indifferent -->
<div class="indifferent">
<h1 class="title">Base<br /><span class="vs">vs.</span><br />Revision</h1>
<h2 class="subtitle">50% match</h2>
<div class="nav-links">
<table>
<tr>
<td><a href="#indifferent.base">Base</a></td>
<td><a href="#indifferent.revision">Revision</a></td>
<td class="last"><a href="#indifferent.match">Comparison</a></td>
</tr>
</table>
</div>
<div class="section base">
<a id="indifferent.base"></a>
<h2>Base</h2>
<div class="summary">
<ul>
<li><strong>Base length:</strong> 5 words and 4 separators (9 total)</li>
<li><strong>Words also in the revision:</strong> 3 of 5 (60%)</li>
<li><strong>Similarity:</strong> 67% identical to the revision</li>
</ul>
</div>
<div class="detail">
A small orange tabby cat
</div>
</div>
<div class="section revision">
<a id="indifferent.revision"></a>
<h2>Revision</h2>
<div class="summary">
<ul>
<li><strong>Revision length:</strong> 4 words and 3 separators (7 total)</li>
<li><strong>Words also in the base:</strong> 3 of 4 (75%)</li>
<li><strong>Similarity:</strong> 86% identical to the base</li>
</ul>
</div>
<div class="detail">
A big orange cat
</div>
</div>
<div class="section match">
<a id="indifferent.match"></a>
<h2>Comparison: 50% match</h2>
<div class="summary">
<ul>
<li><strong>Identical in base and revision:</strong> 3 words and 3 separators (6 total)</li>
<li><strong>Removed from the base:</strong> 2 words and 1 separators (3 total)</li>
<li><strong>Added by the revision:</strong> 1 words and 0 separators (1 total)</li>
</div>
<div class="detail">
A <span class="deleted">small</span><span class="added">big</span> orange <span class="deleted">tabby</span><span class="deleted"> </span>cat
</div>
</div>
</div>
</body>
</html>
indifferent
also has the ability to produce styled HTML pages. CSS can either be internal (in the head) or inline (embedded directly in the HTML).
from indifferent import compare
with open("page.html", "w", encoding="utf-8") as htmlfile:
htmlfile.write(
compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
results="html_page_internal",
)["html"]
)
or
from indifferent import compare
with open("page.html", "w", encoding="utf-8") as htmlfile:
htmlfile.write(
compare(
base = "A small orange tabby cat",
revision = "A big orange cat",
results="html_page_inline",
)["html"]
)
returns a file called page.html
that looks like this:
Contributions are welcome!
- Clone the source code
- Install the project locally:
python3 -m pip install -r requirements-dev.txt -e ".[dev]"
- Make your changes
- Create or update the manually written tests (
tests/test_*.py
) - Regenerate the permutation tests by running
tests/create_permutation_tests.py
- Test with
pytest
Please create or update tests whenever you make changes.
indifferent
is released under the MIT license.