-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept for new config option: ckanext.xloader.site_url #234
base: master
Are you sure you want to change the base?
Changes from all commits
43203b8
da5c031
93c9e5f
deab5cb
729d5b3
03a4e67
4b64d4e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -2,6 +2,14 @@ version: 1 | |||||||
groups: | ||||||||
- annotation: ckanext-xloader settings | ||||||||
options: | ||||||||
- key: ckanext.xloader.site_url | ||||||||
example: http://ckan-dev:5000 | ||||||||
default: | ||||||||
description: | | ||||||||
Provide an alternate site URL for the xloader_submit action. | ||||||||
This is useful, for example, when the site is running within a docker network. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
validators: configured_default("ckan.site_url",None) | ||||||||
required: false | ||||||||
- key: ckanext.xloader.jobs_db.uri | ||||||||
default: sqlite:////tmp/xloader_jobs.db | ||||||||
description: | | ||||||||
|
@@ -152,5 +160,3 @@ groups: | |||||||
they will also display "complete", "active", "inactive", and "unknown". | ||||||||
type: bool | ||||||||
required: false | ||||||||
|
||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,13 +61,11 @@ def configure(self, config_): | |
else: | ||
self.ignore_hash = False | ||
|
||
for config_option in ("ckan.site_url",): | ||
if not config_.get(config_option): | ||
raise Exception( | ||
"Config option `{0}` must be set to use ckanext-xloader.".format( | ||
config_option | ||
) | ||
) | ||
site_url_configs = ("ckan.site_url", "ckanext.xloader.site_url") | ||
if not any(site_url_configs): | ||
raise Exception( | ||
f"One of config options {site_url_configs} must be set to use ckanext-xloader." | ||
) | ||
Comment on lines
+64
to
+68
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ckan will refuse to start if a site_url isn't provided, so this code would never get executed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wardi here, I'm just extending the existing check for that. I prefer not to remove existing behavior, but just to provide the minimal new behavior required for the PR https://github.com/ckan/ckanext-xloader/blob/master/ckanext/xloader/plugin.py#L64 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you did not cleanup the error message, this would not have been commented upon ;) I'm fine leaving this in as its a belts and braces approach, better to fail early (if it does occur which is very very remote) |
||
|
||
# IDomainObjectModification | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,17 +59,18 @@ def data(create_with_upload, apikey): | |
"api.action", ver=3, logic_function="xloader_hook", qualified=True | ||
) | ||
return { | ||
'api_key': apikey, | ||
'job_type': 'xloader_to_datastore', | ||
'result_url': callback_url, | ||
'metadata': { | ||
'ignore_hash': True, | ||
'ckan_url': toolkit.config.get('ckan.site_url'), | ||
'resource_id': resource["id"], | ||
'set_url_type': False, | ||
'task_created': datetime.utcnow().isoformat(), | ||
'original_url': resource["url"], | ||
} | ||
"api_key": apikey, | ||
kowh-ai marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"job_type": "xloader_to_datastore", | ||
"result_url": callback_url, | ||
"metadata": { | ||
"ignore_hash": True, | ||
"ckan_url": toolkit.config.get("ckanext.xloader.site_url") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. going to need more tests created to set these two values for proper validation. |
||
or toolkit.config.get("ckan.site_url"), | ||
"resource_id": resource["id"], | ||
"set_url_type": False, | ||
"task_created": datetime.utcnow().isoformat(), | ||
"original_url": resource["url"], | ||
}, | ||
} | ||
|
||
|
||
|
@@ -89,6 +90,66 @@ def test_xloader_data_into_datastore(self, cli, data): | |
resource = helpers.call_action("resource_show", id=data["metadata"]["resource_id"]) | ||
assert resource["datastore_contains_all_records_of_source_file"] | ||
|
||
def test_download_resource_data_with_ckanext_xloader_site_url(self, cli, data): | ||
# Set the ckanext.xloader.site_url in the config | ||
with mock.patch.dict(toolkit.config, {'ckanext.xloader.site_url': 'http://xloader-site-url'}): | ||
data['metadata']['original_url'] = 'http://xloader-site-url/resource.csv' | ||
self.enqueue(jobs.xloader_data_into_datastore, [data]) | ||
with mock.patch("ckanext.xloader.jobs.get_response", get_response): | ||
stdout = cli.invoke(ckan, ["jobs", "worker", "--burst"]).output | ||
assert "Express Load completed" in stdout | ||
|
||
resource = helpers.call_action("resource_show", id=data["metadata"]["resource_id"]) | ||
assert resource["datastore_contains_all_records_of_source_file"] | ||
|
||
def test_download_resource_data_with_ckan_site_url(self, cli, data): | ||
# Set the ckan.site_url in the config | ||
with mock.patch.dict(toolkit.config, {'ckan.site_url': 'http://ckan-site-url'}): | ||
data['metadata']['original_url'] = 'http://ckan-site-url/resource.csv' | ||
self.enqueue(jobs.xloader_data_into_datastore, [data]) | ||
with mock.patch("ckanext.xloader.jobs.get_response", get_response): | ||
stdout = cli.invoke(ckan, ["jobs", "worker", "--burst"]).output | ||
assert "Express Load completed" in stdout | ||
|
||
resource = helpers.call_action("resource_show", id=data["metadata"]["resource_id"]) | ||
assert resource["datastore_contains_all_records_of_source_file"] | ||
|
||
def test_download_resource_data_with_different_original_url(self, cli, data): | ||
# Set the ckan.site_url in the config | ||
with mock.patch.dict(toolkit.config, {'ckan.site_url': 'http://ckan-site-url'}): | ||
data['metadata']['original_url'] = 'http://external-site-url/resource.csv' | ||
self.enqueue(jobs.xloader_data_into_datastore, [data]) | ||
with mock.patch("ckanext.xloader.jobs.get_response", get_response): | ||
stdout = cli.invoke(ckan, ["jobs", "worker", "--burst"]).output | ||
assert "Express Load completed" in stdout | ||
|
||
resource = helpers.call_action("resource_show", id=data["metadata"]["resource_id"]) | ||
assert resource["datastore_contains_all_records_of_source_file"] | ||
|
||
def test_callback_xloader_hook_with_ckanext_xloader_site_url(self, cli, data): | ||
# Set the ckanext.xloader.site_url in the config | ||
with mock.patch.dict(toolkit.config, {'ckanext.xloader.site_url': 'http://xloader-site-url'}): | ||
data['result_url'] = 'http://xloader-site-url/api/3/action/xloader_hook' | ||
self.enqueue(jobs.xloader_data_into_datastore, [data]) | ||
with mock.patch("ckanext.xloader.jobs.get_response", get_response): | ||
stdout = cli.invoke(ckan, ["jobs", "worker", "--burst"]).output | ||
assert "Express Load completed" in stdout | ||
|
||
resource = helpers.call_action("resource_show", id=data["metadata"]["resource_id"]) | ||
assert resource["datastore_contains_all_records_of_source_file"] | ||
|
||
def test_callback_xloader_hook_with_ckan_site_url(self, cli, data): | ||
# Set the ckan.site_url in the config | ||
with mock.patch.dict(toolkit.config, {'ckan.site_url': 'http://ckan-site-url'}): | ||
data['result_url'] = 'http://ckan-site-url/api/3/action/xloader_hook' | ||
self.enqueue(jobs.xloader_data_into_datastore, [data]) | ||
with mock.patch("ckanext.xloader.jobs.get_response", get_response): | ||
stdout = cli.invoke(ckan, ["jobs", "worker", "--burst"]).output | ||
assert "Express Load completed" in stdout | ||
|
||
resource = helpers.call_action("resource_show", id=data["metadata"]["resource_id"]) | ||
assert resource["datastore_contains_all_records_of_source_file"] | ||
|
||
def test_xloader_ignore_hash(self, cli, data): | ||
self.enqueue(jobs.xloader_data_into_datastore, [data]) | ||
with mock.patch("ckanext.xloader.jobs.get_response", get_response): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
|
||
import json | ||
import datetime | ||
import re | ||
|
||
from six import text_type as str, binary_type | ||
|
||
|
@@ -13,6 +14,8 @@ | |
import ckan.plugins as p | ||
from ckan.plugins.toolkit import config | ||
|
||
from urllib.parse import urljoin, urlunsplit, urlparse | ||
|
||
# resource.formats accepted by ckanext-xloader. Must be lowercase here. | ||
DEFAULT_FORMATS = [ | ||
"csv", | ||
|
@@ -107,6 +110,60 @@ def get_xloader_user_apitoken(): | |
return site_user["apikey"] | ||
|
||
|
||
def modify_ckan_url(result_url: str, ckan_url: str) -> str: | ||
""" Modifies a URL based on CKAN site URL comparison. | ||
|
||
This function compares the base URL of a given result URL against a CKAN site URL. | ||
If they differ, the result URL is modified to use the CKAN site URL as its base | ||
while preserving the original path. | ||
|
||
Args: | ||
result_url (str): The original URL to potentially modify | ||
ckan_url (str): The base CKAN site URL to compare against | ||
Returns: | ||
str: The modified URL if base URLs differ, otherwise returns original URL unchanged | ||
""" | ||
parsed_url = urlparse(result_url) | ||
base_url = f"{parsed_url.scheme}://{parsed_url.netloc}" | ||
if base_url != ckan_url: | ||
path_url = parsed_url.path | ||
result_url = urljoin(ckan_url, path_url) | ||
|
||
return result_url | ||
|
||
|
||
def modify_resource_url(orig_ckan_url: str) -> str: | ||
"""Returns a potentially modified CKAN URL. | ||
|
||
This function takes a CKAN URL and potentially modifies its base URL while preserving the path, | ||
query parameters, and fragments. The modification occurs only if two conditions are met: | ||
1. The base URL of the input matches the configured CKAN site URL | ||
2. An xloader_site_url is configured in the settings | ||
|
||
Args: | ||
orig_ckan_url (str): The original CKAN URL to potentially modify | ||
Returns: | ||
str: Either the modified URL with new base URL from xloader_site_url, | ||
or the original URL if conditions aren't met | ||
""" | ||
xloader_site_url = config.get('ckanext.xloader.site_url') | ||
ckan_site_url = config.get('ckan.site_url') | ||
|
||
parsed_url = urlparse(orig_ckan_url) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't needed until we're inside the conditional. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @ThrawnCA - Whereabouts? AFAICT I'm seeing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I missed the usage inside the f-string. |
||
base_url = f"{parsed_url.scheme}://{parsed_url.netloc}" | ||
|
||
# If the base URL matches the CKAN site URL and xloader_site_url is set, modify the URL | ||
if base_url == ckan_site_url and xloader_site_url: | ||
modified_ckan_url = urljoin(xloader_site_url, parsed_url.path) | ||
if parsed_url.query: | ||
modified_ckan_url += f"?{parsed_url.query}" | ||
if parsed_url.fragment: | ||
modified_ckan_url += f"#{parsed_url.fragment}" | ||
return modified_ckan_url | ||
|
||
return orig_ckan_url | ||
|
||
|
||
def set_resource_metadata(update_dict): | ||
''' | ||
Set appropriate datastore_active flag on CKAN resource. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need an empty string default, it's normal for optional settings to be not present if not provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of the
toolkit.config.get("ckanext.xloader.site_url") or toolkit.config.get("ckan.site_url")
logic below we should be able to use justtoolkit.config.get("ckanext.xloader.site_url")
along with:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wardi ok I see. so that default is just set in the config yaml, and then, any code can assume that fallback default for the
ckanext.xloader.site_url
setting?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this validator will set
ckanext.xloader.site_url
to the same asckan.site_url
when it's not given. Another one of @smotornyuk 's clever ideas.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to bother with that when
jobs.py
already has the fallbackconfig.get('ckanext.xloader.site_url') or config.get('ckan.site_url')
?IMO we should keep the fallback in
jobs.py
in case someone manually setsckanext.xloader.site_url
to a blank value (which could happen if eg the config is being automatically generated/populated from somewhere, perhaps usingckanext-ssm-config
), so the validator approach is redundant.