Skip to content

Commit

Permalink
Merge pull request #26 from qld-gov-au/develop
Browse files Browse the repository at this point in the history
Develop to master
  • Loading branch information
ThrawnCA authored Feb 3, 2021
2 parents 8afdcad + 2bcc3eb commit ad1ab2a
Show file tree
Hide file tree
Showing 39 changed files with 2,224 additions and 1,534 deletions.
77 changes: 77 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
#based on https://raw.githubusercontent.com/ckan/ckanext-scheming/master/.github/workflows/test.yml
# alternative https://github.com/ckan/ckan/blob/master/contrib/cookiecutter/ckan_extension/%7B%7Bcookiecutter.project%7D%7D/.github/workflows/test.yml
name: Tests
on: [push, pull_request]
jobs:
lint:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.6'
- name: Install requirements
run: pip install flake8 pycodestyle
- name: Check syntax
run: flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics --exclude ckan

test:
needs: lint
strategy:
matrix:
ckan-version: [2.9, 2.9-py2, 2.8, 2.7]
fail-fast: false

name: CKAN ${{ matrix.ckan-version }}
runs-on: ubuntu-18.04
container:
image: openknowledge/ckan-dev:${{ matrix.ckan-version }}
services:
solr:
image: ckan/ckan-solr-dev:${{ matrix.ckan-version }}
postgres:
image: ckan/ckan-postgres-dev:${{ matrix.ckan-version }}
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5
redis:
image: redis:3
env:
CKAN_SQLALCHEMY_URL: postgresql://ckan_default:pass@postgres/ckan_test
CKAN_DATASTORE_WRITE_URL: postgresql://datastore_write:pass@postgres/datastore_test
CKAN_DATASTORE_READ_URL: postgresql://datastore_read:pass@postgres/datastore_test
CKAN_SOLR_URL: http://solr:8983/solr/ckan
CKAN_REDIS_URL: redis://redis:6379/1

steps:
- uses: actions/checkout@v2
- name: Install requirements
run: |
pip install -r dev-requirements.txt
pip install -r pip-requirements.txt
pip install -e .
# Replace default path to CKAN core config file with the one on the container
sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test.ini
#sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test_subclass.ini
- name: Install Full Text Search
run: |
echo "Create full text function..."
export PGPASSWORD=pass
psql "postgresql://datastore_write:pass@postgres/datastore_test" -f full_text_function.sql
- name: Setup extension (CKAN >= 2.9)
if: ${{ matrix.ckan-version != '2.7' && matrix.ckan-version != '2.8' }}
run: |
ckan -c test.ini db init
# ckan -c test.ini datastore set-permissions | sudo -u postgres psql -p $PG_PORT
- name: Setup extension (CKAN < 2.9)
if: ${{ matrix.ckan-version == '2.7' || matrix.ckan-version == '2.8' }}
run: |
paster --plugin=ckan db init -c test.ini
# paster datastore set-permissions -c test-core.ini | sudo -u postgres psql -p $PG_PORT
- name: Run all tests
run: pytest --ckan-ini=test.ini --cov=ckanext.xloader ckanext/xloader/tests

106 changes: 59 additions & 47 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,65 +1,77 @@
os: linux
dist: focal
language: python
dist: bionic
python:
- "2.7"
env:
- CKAN_GIT_REPO=ckan CKAN_BRANCH=master
- CKAN_GIT_REPO=ckan CKANVERSION=2.8
- CKAN_GIT_REPO=qld-gov-au CKAN_BRANCH=qgov-master

services:
- docker
- redis
- postgresql
before_install:
- apt-cache policy libmagic1
install:
- pip install -U pip wheel
- bash bin/travis-build.bash
- pip install coveralls
script: sh bin/travis-run.sh
after_success:
- coveralls

install: bash bin/travis-build.bash
script: bash bin/travis-run.bash

stages:
- Flake8
- test
- Tests


jobs:
include:
- stage: Flake8
python: 3.6
env: FLAKE8=True
install:
- pip install -U pip wheel
- sh bin/travis-flake.sh
- pip install flake8==3.5.0
- pip install pycodestyle==2.3.0
script:
- sh bin/travis-flake-run.sh
- stage: test #master build on python 3 alpha
env: CKAN_GIT_REPO=ckan CKAN_BRANCH=master
- flake8 --version
# stop the build if there are Python syntax errors or undefined names
- flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics --exclude ckan
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
- flake8 . --count --max-line-length=127 --statistics --exclude ckan --exit-zero


- stage: Tests
python: "3.6"
- stage: test #2.7 build on trusty
# ensure https://travis-ci.org/github/$yourRepo/ckanext-xloader/settings "Enable build config validation" is off if builds are not getting to testing
# the new trusty images of Travis cause build errors with psycopg2 in 2.7 and below, see https://github.com/travis-ci/travis-ci/issues/8897
# due to psycopg2 2.4 does not support newer db's, https://github.com/psycopg/psycopg2/issues/594 and no backports being given to 2.7 or older
# releases, we need ot hard lock for them. (needs to be on libpq-dev=9.3.* )
dist: trusty
group: deprecated-2017Q4 #must be upper case but travis "build config validation" if enabled will lowercase it and break the build
env: CKAN_GIT_REPO=ckan CKANVERSION=2.7
env: CKAN_GIT_REPO=ckan CKANVERSION=2.9
addons:
postgresql: '12'
apt:
sources:
- sourceline: 'deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main'
packages:
- postgresql-12


- python: "2.7"
env: CKAN_GIT_REPO=ckan CKANVERSION=2.9
addons:
postgresql: '12'
apt:
sources:
- sourceline: 'deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main'
packages:
- postgresql-12

- python: "2.7"
env: CKAN_GIT_REPO=qld-gov-au CKANVERSION=2.8
addons:
postgresql: "9.6"
- stage: test #2.8 build on trusty
# ensure https://travis-ci.org/github/$yourRepo/ckanext-xloader/settings "Enable build config validation" is off if builds are not getting to testing
# the new trusty images of Travis cause build errors with psycopg2 in 2.7 and below, see https://github.com/travis-ci/travis-ci/issues/8897
# due to psycopg2 2.4 does not support newer db's, https://github.com/psycopg/psycopg2/issues/594 and no backports being given to 2.7 or older
# releases, we need ot hard lock for them. (needs to be on libpq-dev=9.3.* )
dist: trusty
group: deprecated-2017Q4 #must be upper case but travis "build config validation" if enabled will lowercase it and break the build
env: CKAN_GIT_REPO=ckan CKANVERSION=2.8
postgresql: '11'
apt:
sources:
- sourceline: 'deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main'
packages:
- postgresql-11

- python: "2.7"
env: CKANVERSION=2.7
addons:
postgresql: "9.6"
allow_failures:
- env: CKAN_GIT_REPO=ckan CKAN_BRANCH=master
python: "3.6"
- env: CKAN_GIT_REPO=ckan CKAN_BRANCH=master
python: "2.7" #master build on python 2 alpha (is it backwards compatible)
postgresql: '9.6'
apt:
sources:
- sourceline: 'deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main'
packages:
- postgresql-9.6

cache: pip
cache:
directories:
- $HOME/.cache/pip
27 changes: 27 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,30 @@
0.7.0 2020-11-23
================

Features:
* Python 3 support #113
* CKAN 2.9 support #113

Fixes:
* Update resource hash after load to datastore #116


0.6.1 2020-05-03
================

Features:
* Add 'just_load_with_messytables' option #96

Fixes:
* When getting the resource from CKAN, it now copes with the edge case that CKAN hasn't quite added the resource yet - now it successfully retries #94


0.6.0 2020-04-27
================

Release withdrawn


0.5.0 2019-12-04
================

Expand Down
58 changes: 40 additions & 18 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,13 +63,13 @@ convert columns to the types they want (using the Data Dictionary feature). In
future it could do automatic detection and conversion.

Simpler queueing tech
----------------------
---------------------

DataPusher - job queue is done by ckan-service-provider which is bespoke,
complicated and stores jobs in its own database (sqlite by default).

XLoader - job queue is done by RQ, which is simpler, is backed by Redis, allows
access to the CKAN model and is CKAN's default queue technology (sinc CKAN
access to the CKAN model and is CKAN's default queue technology (since CKAN
2.7). You can also debug jobs easily using pdb. Job results are stored in
Sqlite by default, and for production simply specify CKAN's database in the
config and it's held there - easy.
Expand Down Expand Up @@ -115,6 +115,19 @@ Works with CKAN 2.7.x and later.

Works with CKAN 2.3.x - 2.6.x if you install ckanext-rq.

Compatibility with core CKAN versions:

=============== =============
CKAN version Compatibility
=============== =============
2.3 yes, but no longer tested and you must install ckanext-rq
2.4 yes, but no longer tested and you must install ckanext-rq
2.5 yes, but no longer tested and you must install ckanext-rq
2.6 yes, but no longer tested and you must install ckanext-rq
2.7 yes
2.8 yes
2.9 yes (both Python2 and Python3)
=============== =============

------------
Installation
Expand Down Expand Up @@ -220,10 +233,12 @@ Configuration:
# The maximum size of files to load into DataStore. In bytes. Default is 1 GB.
ckanext.xloader.max_content_length = 1000000000

# Always use messytables instead of attempting a direct PostgreSQL COPY.
# This more closely matches the DataPusher's behavior, both in results
# (automatically guessing column types) and in speed.
ckanext.xloader.compatibility_mode = True
# To always use messytables to load data, instead of attempting a direct
# PostgreSQL COPY, set this to True. This more closely matches the
# DataPusher's behavior. It has the advantage that the column types
# are guessed. However it is more error prone, far slower and you can't run
# the CPU-intensive queue on a separate machine.
ckanext.xloader.just_load_with_messytables = False

# The maximum time for the loading of a resource before it is aborted.
# Give an amount in seconds. Default is 60 minutes
Expand Down Expand Up @@ -269,14 +284,14 @@ To upgrade from DataPusher to XLoader:

2. (Optional) For existing datasets that have been datapushed to datastore, freeze the column types (in the data dictionaries), so that XLoader doesn't change them back to string on next xload::

paster --plugin=ckanext-xloader migrate_types -c /etc/ckan/default/ckan.ini
ckan -c /etc/ckan/default/ckan.ini migrate_types

3. If you've not already, change the enabled plugin in your config - on the
``ckan.plugins`` line replace ``datapusher`` with ``xloader``.

4. (Optional) Enable 'compatibility mode' for slower loading but automatic
guessing of column types.
Add ``ckanext.xloader.compatibility_mode = True`` to your config.
4. (Optional) If you wish, you can disable the direct loading and continue to
just use messytables - for more about this see the docs on config option:
``ckanext.xloader.just_load_with_messytables``

5. Stop the datapusher worker::

Expand All @@ -296,29 +311,35 @@ command-line interface.

e.g. ::

paster --plugin=ckanext-xloader xloader submit <dataset-name> -c /etc/ckan/default/ckan.ini
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name>
[pre-2.9] paster --plugin=ckanext-xloader xloader submit <dataset-name> -c /etc/ckan/default/ckan.ini

For debugging you can try xloading it synchronously (which does the load
directly, rather than asking the worker to do it) with the ``-s`` option::

paster --plugin=ckanext-xloader xloader submit <dataset-name> -s -c /etc/ckan/default/ckan.ini
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit <dataset-name> -s
[pre-2.9] paster --plugin=ckanext-xloader xloader submit <dataset-name> -s -c /etc/ckan/default/ckan.ini

See the status of jobs::

paster --plugin=ckanext-xloader xloader status -c /etc/ckan/default/development.ini
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader status
[pre-2.9] paster --plugin=ckanext-xloader xloader status -c /etc/ckan/default/development.ini

Submit all datasets' resources to the DataStore::

paster --plugin=ckanext-xloader xloader submit all -c /etc/ckan/default/ckan.ini
[2.9] ckan -c /etc/ckan/default/ckan.ini submit all
[pre-2.9] paster --plugin=ckanext-xloader xloader submit all -c /etc/ckan/default/ckan.ini

Re-submit all the resources already in the DataStore (Ignores any resources
that have not been stored in DataStore e.g. because they are not tabular)::

paster --plugin=ckanext-xloader xloader submit all-existing -c /etc/ckan/default/ckan.ini
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader submit all-existing
[pre-2.9] paster --plugin=ckanext-xloader xloader submit all-existing -c /etc/ckan/default/ckan.ini

**Full list of XLoader CLI commands**::

paster --plugin=ckanext-xloader xloader --help
[2.9] ckan -c /etc/ckan/default/ckan.ini xloader --help
[pre-2.9] paster --plugin=ckanext-xloader xloader --help

Jobs and workers
----------------
Expand All @@ -331,7 +352,8 @@ Useful commands:

Clear (delete) all outstanding jobs::

paster --plugin=ckan jobs clear [QUEUES] -c /etc/ckan/default/development.ini
CKAN 2.9, Python 3 ckan -c /etc/ckan/default/ckan.ini jobs clear [QUEUES]
CKAN <2.9, Python 2 paster --plugin=ckanext-xloader xloader jobs clear [QUEUES] -c /etc/ckan/default/development.ini

If having trouble with the worker process, restarting it can help::

Expand Down Expand Up @@ -394,7 +416,7 @@ To publish a new version to PyPI follow these steps:

pip install --upgrade setuptools wheel twine

4. Create a source and binary distributions of the new version::
4. Create source and binary distributions of the new version::

python setup.py sdist bdist_wheel && twine check dist/*

Expand Down
Loading

0 comments on commit ad1ab2a

Please sign in to comment.