Skip to content

Commit

Permalink
Fix spellcheck bug and track unique spelling errors
Browse files Browse the repository at this point in the history
merges manubot/rootstock#337
refs manubot/rootstock#336

* Track unique spelling errors
Fixes a bug in how unique words were calculated.
Also stores the list of unique misspelled words in addition
to the locations of the misspelled words.

* Create separate spellcheck install script
* Add spellcheck to GitHub Actions workflow
* Search for expanded punctuation in misspelled words
  • Loading branch information
agitter authored May 5, 2020
1 parent 1b69406 commit 92f0285
Show file tree
Hide file tree
Showing 7 changed files with 50 additions and 10 deletions.
9 changes: 6 additions & 3 deletions .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,11 @@ test_script:
- |
if [ "${SPELLCHECK:-}" = "true" ]; then
SPELLING_ERRORS_FILENAME=spelling-errors-$APPVEYOR_BUILD_VERSION-${TRIGGERING_COMMIT:0:7}.txt
cp output/spelling-errors.txt $SPELLING_ERRORS_FILENAME;
cp output/spelling-errors.txt $SPELLING_ERRORS_FILENAME
appveyor PushArtifact $SPELLING_ERRORS_FILENAME
SPELLING_ERROR_LOCATIONS_FILENAME=spelling-error-locations-$APPVEYOR_BUILD_VERSION-${TRIGGERING_COMMIT:0:7}.txt
cp output/spelling-error-locations.txt $SPELLING_ERROR_LOCATIONS_FILENAME
appveyor PushArtifact $SPELLING_ERROR_LOCATIONS_FILENAME
fi
build: off
Expand All @@ -59,8 +62,8 @@ on_success:
- appveyor AddMessage "$JOB_MESSAGE is now complete."
- |
if [ "${SPELLCHECK:-}" = "true" ]; then
SPELLING_ERROR_COUNT=($(wc -l $SPELLING_ERRORS_FILENAME))
appveyor AddMessage " <details><summary>Found $SPELLING_ERROR_COUNT potential spelling error(s). Preview:</summary>$(cat $SPELLING_ERRORS_FILENAME)"
SPELLING_ERROR_COUNT=($(wc -l $SPELLING_ERROR_LOCATIONS_FILENAME))
appveyor AddMessage " <details><summary>Found $SPELLING_ERROR_COUNT potential spelling error(s). Preview:</summary>$(head -n 100 $SPELLING_ERROR_LOCATIONS_FILENAME)"
appveyor AddMessage "... </details>"
fi
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/manubot.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ jobs:
runs-on: ubuntu-latest
env:
GITHUB_PULL_REQUEST_SHA: ${{ github.event.pull_request.head.sha }}
SPELLCHECK: true
steps:
- name: Set Environment Variables
run: |
Expand All @@ -38,6 +39,12 @@ jobs:
environment-file: build/environment.yml
auto-activate-base: false
miniconda-version: 'latest'
- name: Install Spellcheck
shell: bash --login {0}
run: |
if [ "${SPELLCHECK:-}" = "true" ]; then
bash ci/install-spellcheck.sh
fi
- name: Build Manuscript
shell: bash --login {0}
run: bash build/build.sh
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ webpage/v
# Manubot cache directory
ci/cache

# Pandoc filters downloaded during continuous integration setup
build/assets/spellcheck.lua

# Python
__pycache__/
*.pyc
Expand Down
2 changes: 1 addition & 1 deletion USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ metadata:
When the `SPELLCHECK` environment variable is `true`, the pandoc [spellcheck filter](https://github.com/pandoc/lua-filters/tree/master/spellcheck) is run.
Potential spelling errors will be printed in the continuous integration log along with the files and line numbers in which they appeared.
Words in `build/assets/custom-dictionary.txt` are ignored during spellchecking.
Spellchecking is currently only supported for English language manuscripts and with Travis CI and AppVeyor continuous integration services.
Spellchecking is currently only supported for English language manuscripts.

## Manubot feedback

Expand Down
20 changes: 18 additions & 2 deletions build/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,28 @@ fi
# Spellcheck
if [ "${SPELLCHECK:-}" = "true" ]; then
export ASPELL_CONF="add-extra-dicts $(pwd)/build/assets/custom-dictionary.txt; ignore-case true"

# Identify and store spelling errors
pandoc --lua-filter build/assets/spellcheck.lua output/manuscript.md | sort -fu > output/spelling-errors.txt
echo >&2 "Potential spelling errors:"
cat output/spelling-errors.txt

# Add additional forms of punctuation that Pandoc converts so that the
# locations can be detected
# Create a new expanded spelling errors file so that the saved artifact
# contains only the original misspelled words
cp output/spelling-errors.txt output/expanded-spelling-errors.txt
grep "" output/spelling-errors.txt | sed "s/’/'/g" >> output/expanded-spelling-errors.txt || true

# Find locations of spelling errors
# Use "|| true" after grep because otherwise this step of the pipeline will
# return exit code 1 if any of the markdown files do not contain a
# misspelled word
pandoc --lua-filter spellcheck.lua output/manuscript.md | uniq | while read word; do grep -ion "\<$word\>" content/*.md; done | sort -h -t ":" -k 1b,1 -k2,2 > output/spelling-errors.txt || true
cat output/expanded-spelling-errors.txt | while read word; do grep -ion "\<$word\>" content/*.md; done | sort -h -t ":" -k 1b,1 -k2,2 > output/spelling-error-locations.txt || true
echo >&2 "Filenames and line numbers with potential spelling errors:"
cat output/spelling-errors.txt
cat output/spelling-error-locations.txt

rm output/expanded-spelling-errors.txt
fi

echo >&2 "Build complete"
12 changes: 12 additions & 0 deletions ci/install-spellcheck.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/usr/bin/env bash

## install-spellcheck.sh: run during a CI build to install Pandoc spellcheck dependencies.

# Set options for extra caution & debugging
set -o errexit \
-o pipefail

sudo apt-get update -y
sudo apt-get install -y aspell aspell-en
wget https://raw.githubusercontent.com/pandoc/lua-filters/13c3fa7e97206413609a48a82575cb43137e037f/spellcheck/spellcheck.lua
mv spellcheck.lua build/assets/spellcheck.lua
7 changes: 3 additions & 4 deletions ci/install.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/usr/bin/env bash

## install.sh: run during a Travis CI or AppVeyor build to install the conda environment.
## install.sh: run during a Travis CI or AppVeyor build to install the conda environment
## and the optional Pandoc spellcheck dependencies.

# Set options for extra caution & debugging
set -o errexit \
Expand All @@ -20,7 +21,5 @@ conda activate manubot

# Install Spellcheck filter for Pandoc
if [ "${SPELLCHECK:-}" = "true" ]; then
sudo apt-get update -y
sudo apt-get install -y aspell aspell-en
wget https://raw.githubusercontent.com/pandoc/lua-filters/1c553017ecc58914c22bf2372902dca4a456929b/spellcheck/spellcheck.lua
bash ci/install-spellcheck.sh
fi

0 comments on commit 92f0285

Please sign in to comment.