Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xloader still always pending #240

Open
riccardoricciagid opened this issue Dec 31, 2024 · 15 comments
Open

xloader still always pending #240

riccardoricciagid opened this issue Dec 31, 2024 · 15 comments

Comments

@riccardoricciagid
Copy link

CKAN version

Describe the bug

After loading CSV file, tables built and populated correctly, in the ckan log appear 403 /api/3/action/xloader_hook and resources appear in "datastore pending" state.

Steps to reproduce

Try to upload a CSV file

immagine

@wardi
Copy link
Contributor

wardi commented Jan 7, 2025

Hello @riccardoricciagid can you provide some information about how you're running ckan? Is there any additional information in your logs?

If you have a reverse proxy or other more complicated networking environment you might need something like the ckanext.xloader.site_url configuration option from #234

@wardi wardi transferred this issue from ckan/ckan Jan 7, 2025
@wardi wardi removed their assignment Jan 7, 2025
@duttonw
Copy link
Collaborator

duttonw commented Jan 7, 2025

Hi @riccardoricciagid

What version of CKAN and what other plugins are installed.

Is the background jobs enabled?
Had all jobs been processed
Ckan jobs list

@fpichardom
Copy link

I have the same or a very similar issue, I think. I have a fresh install of CKAN 2.11.1 from source with the activity, scheming_datasets, xloader plugins and my custom plugin that adds a couple of dataset types with scheming. I checked that the datastore is working properly using the API to read and write. My instance is running with NGINX as a proxy from http://127.0.0.1:8080/ to my VPS IP address(for now). The service is running using wsgi and supervisor according to the default instructions. I have background workers running simply as can -c $CKAN_INI jobs worker, and it can successfully complete a test job using `can -c $CKAN_INI jobs test.

So my issue is, as originally described in this thread when I upload a new CSV file or click on Upload to Datastore via the UI, it just says pending and it is stuck there. Using debug mode in my uwsgi.ERR file I can see this when I click on the Upload to Datastore button:

2025-01-08 00:00:33,675 INFO  [ckanext.xloader.action] A pending task was found '748d2665-5137-40be-b398-64f1a1c1e0d8', but its not found in the queue [] and is 0:05:26.018433 hours old
2025-01-08 00:00:33,682 DEBUG [ckanext.xloader.action] Timeout for XLoading resource f54f63bc-540a-41d5-9c3f-c3ad4158923b is 3600
2025-01-08 00:00:33,683 INFO  [ckan.lib.jobs] Added background job c64e6bec-1abd-423d-807c-9aef45e6cccf ("xloader_submit: package: 39fd2cf6-366e-4cb5-8be6-be42e1ee026e resource: f54f63bc-540a-41d5-9c3f-c3ad4158923b") to queue "default"
2025-01-08 00:00:33,683 DEBUG [ckanext.xloader.action] Enqueued xloader job=c64e6bec-1abd-423d-807c-9aef45e6cccf res_id=f54f63bc-540a-41d5-9c3f-c3ad4158923b
...

When I try to force the process using ckan -c $CKAN_INI xloader submit --sync <PACKAGE_ID> I originally would get: 2025-01-08 00:10:19,095 ERROR [ckan.lib.api_token] Cannot decode JWT token: Not enough segments and nothing would happen. After I added in the ckan.ini the ckanext.xloader.site_url it still shows the error but it updates the datastore, but still it shows as pending in the UI.

When I added what @wardi suggested (ckanext.xloader.site_url with my VPS IP address) I would get something like:

2025-01-08 00:22:42,454 ERROR [ckan.lib.api_token] Cannot decode JWT token: Not enough segments
      ckanext.xloader.cli INFO  Express Load starting: /dataset/test-new-uploads/resource/8279ab2c-1c80-4059-b479-9afb61f56079
      ckanext.xloader.cli INFO  Fetching from: http://127.0.0.1:8080/dataset/39fd2cf6-366e-4cb5-8be6-be42e1ee026e/resource/8279ab2c-1c80-4059-b479-9afb61f56079/download/task_clusters.csv
      ckanext.xloader.cli INFO  Downloaded ok - 300.0 bytes
...

Still throws the Cannot decode JWT token error but it uploads the data to the datastore. In the UI it still appears as pending, and it still doesn't work naturally when uploading a file or clicking the Upload to Datastore button.

@duttonw
Copy link
Collaborator

duttonw commented Jan 8, 2025

Hmm,

I seems 2.11 now needs extra 'secrets' can you give this a go on both the ckan primary node and ensure that worker node has the same ckan.ini config (if on different instances)

echo "Setting beaker.session.secret in ini file"
          ckan config-tool $CKAN_INI "beaker.session.secret=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          ckan config-tool $CKAN_INI "SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          ckan config-tool $CKAN_INI "WTF_CSRF_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          JWT_SECRET=$(python3 -c 'import secrets; print("string:" + secrets.token_urlsafe())')
          ckan config-tool $CKAN_INI "api_token.jwt.encode.secret=${JWT_SECRET}"
          ckan config-tool $CKAN_INI "api_token.jwt.decode.secret=${JWT_SECRET}"

https://github.com/duttonw/ckan-docker-base/blob/main/ckan-2.11/setup/start_ckan.sh#L16

@fpichardom
Copy link

Hmm,

I seems 2.11 now needs extra 'secrets' can you give this a go on both the ckan primary node and ensure that worker node has the same ckan.ini config (if on different instances)

echo "Setting beaker.session.secret in ini file"
          ckan config-tool $CKAN_INI "beaker.session.secret=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          ckan config-tool $CKAN_INI "SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          ckan config-tool $CKAN_INI "WTF_CSRF_SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_urlsafe())')"
          JWT_SECRET=$(python3 -c 'import secrets; print("string:" + secrets.token_urlsafe())')
          ckan config-tool $CKAN_INI "api_token.jwt.encode.secret=${JWT_SECRET}"
          ckan config-tool $CKAN_INI "api_token.jwt.decode.secret=${JWT_SECRET}"

https://github.com/duttonw/ckan-docker-base/blob/main/ckan-2.11/setup/start_ckan.sh#L16

I tried updating the ckan.ini with the above mentioned code and I'm still getting the same error 2025-01-08 02:42:22,455 ERROR [ckan.lib.api_token] Cannot decode JWT token: Not enough segments and still showing as pending in the UI. It seems that in the generated ckan.ini the other 'secrets' take the value of the the main SECRET_KEY by default. I generated the main SECRET_KEY with the code you shared, I believe.

WTF_CSRF_SECRET_KEY = string:%(SECRET_KEY)s
api_token.jwt.encode.secret = string:%(SECRET_KEY)s
api_token.jwt.decode.secret = string:%(SECRET_KEY)s

I'm also running the worker in the same instance using the same ckan.ini file

@riccardoricciagid
Copy link
Author

riccardoricciagid commented Jan 8, 2025

Hello @riccardoricciagid can you provide some information about how you're running ckan? Is there any additional information in your logs?

If you have a reverse proxy or other more complicated networking environment you might need something like the ckanext.xloader.site_url configuration option from #234

My CKAN run in a custom docker container.
immagine

Actually the URL is here:
ckan.site_url = http://192.168.2.20:5000
it will be changed when I expose the CKAN.

All wprkers are running...
immagine

Datasets not in CSV are uploaded via a CURL script (every dataset has >100 datasets). The datasets in CSV format are uploaded manually from the UI. These datasets was converted and written to DB by xloader correctly. No errors in logs only INFO about Chunks and ANALYZE table at end.

immagine

Strange warnng about the webassets.yml on ckanext-xloader... but this seems not be an important error.

CKAN Version: 2.11.1
Xloader Version: the last on https://github.com/ckan/ckanext-xloader.git (7 days ago).

@kowh-ai
Copy link
Collaborator

kowh-ai commented Jan 10, 2025

I can also confirm this problem (DataStore Pending and Cannot decode JWT token: Not enough segments. I have also found the token data passed to the api_token.decodemethod in CKAN looks to be in UUID format rather than a token format ie: a format of 32 hexadecimal digits displayed in 5 groups separated by hyphens (8-4-4-4-12 characters)...this would explain the Cannot decode JWT token: Not enough segments error

@riccardoricciagid
Copy link
Author

riccardoricciagid commented Jan 10, 2025

Here is the worker log for a submit:
2025-01-10 11:13:49,223 INFO [ckan.lib.jobs] Worker rq:worker:ddfa06f8ef2b44bcb71b6c2289541896 starts job 1d46372f-b904-4acd-80a6-11c173a57ddb (xloader_submit: package: c8b178cb-c6d0-4391-9c5f-7f9378dac6a5 resource: 44af7805-d95e-4f49-a2d8-1f57279b6526) from queue "default"
2025-01-10 11:13:49,237 INFO [ckan.config.environment] Loading static files from public
2025-01-10 11:13:49,349 INFO [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
2025-01-10 11:13:49,511 WARNI [ckan.lib.webassets_tools] Cannot create library ckanext-xloader at /usr/lib/ckan/default/lib/python3.10/site-packages/ckanext/xloader/webassets because webassets.yaml is missing
2025-01-10 11:13:49,588 INFO [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
2025-01-10 11:13:49,753 ERROR [ckan.lib.api_token] Cannot decode JWT token: Signature verification failed
Express Load starting: /dataset/report-system-resources/resource/44af7805-d95e-4f49-a2d8-1f57279b6526
2025-01-10 11:13:49,778 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Express Load starting: /dataset/report-system-resources/resource/44af7805-d95e-4f49-a2d8-1f57279b6526
Fetching from: http://192.168.2.20:5000/dataset/c8b178cb-c6d0-4391-9c5f-7f9378dac6a5/resource/44af7805-d95e-4f49-a2d8-1f57279b6526/download/report_20250103.csv
2025-01-10 11:13:49,794 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Fetching from: http://192.168.2.20:5000/dataset/c8b178cb-c6d0-4391-9c5f-7f9378dac6a5/resource/44af7805-d95e-4f49-a2d8-1f57279b6526/download/report_20250103.csv
Downloaded ok - 17.9 MB
2025-01-10 11:13:49,880 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Downloaded ok - 17.9 MB
File hash: 95f1a88531a7e7f7c720477a4687c247
2025-01-10 11:13:49,894 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] File hash: 95f1a88531a7e7f7c720477a4687c247
Loading CSV
2025-01-10 11:13:49,911 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Loading CSV
'use_type_guessing' mode is: False
2025-01-10 11:13:49,950 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] 'use_type_guessing' mode is: False
load_csv: Decoded encoding: {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
2025-01-10 11:13:50,520 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] load_csv: Decoded encoding: {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
Ensuring character coding is UTF8
2025-01-10 11:13:50,549 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Ensuring character coding is UTF8
Clearing records for "44af7805-d95e-4f49-a2d8-1f57279b6526" from DataStore.
2025-01-10 11:13:50,608 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Clearing records for "44af7805-d95e-4f49-a2d8-1f57279b6526" from DataStore.
Fields: [{'id': 'TMST', 'type': 'numeric', 'info': {'type_override': 'numeric'}, 'strip_extra_white': True}, {'id': 'RESID', 'type': 'text'}, {'id': 'PRES', 'type': 'text'}]
2025-01-10 11:13:50,739 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Fields: [{'id': 'TMST', 'type': 'numeric', 'info': {'type_override': 'numeric'}, 'strip_extra_white': True}, {'id': 'RESID', 'type': 'text'}, {'id': 'PRES', 'type': 'text'}]
Copying to database...
2025-01-10 11:13:51,829 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Copying to database...
...copying done
2025-01-10 11:13:54,069 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] ...copying done
Creating search index...
2025-01-10 11:13:54,085 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Creating search index...
...search index created
2025-01-10 11:14:08,428 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] ...search index created
Calculating record count (running ANALYZE on the table)
2025-01-10 11:14:08,457 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Calculating record count (running ANALYZE on the table)
Setting resource.datastore_active = True
2025-01-10 11:14:08,598 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Setting resource.datastore_active = True
Setting resource.datastore_contains_all_records_of_source_file = True
2025-01-10 11:14:10,478 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Setting resource.datastore_contains_all_records_of_source_file = True
Data now available to users: /dataset/report-system-resources/resource/44af7805-d95e-4f49-a2d8-1f57279b6526
2025-01-10 11:14:13,598 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Data now available to users: /dataset/report-system-resources/resource/44af7805-d95e-4f49-a2d8-1f57279b6526
Creating column indexes (a speed optimization for queries)...
2025-01-10 11:14:13,617 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Creating column indexes (a speed optimization for queries)...
...column indexes created.
2025-01-10 11:14:31,715 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] ...column indexes created.
2025-01-10 11:14:32,132 DEBUG [ckanext.activity.logic.action] Created 'changed package' activity
File Hash updated for resource: 95f1a88531a7e7f7c720477a4687c247
2025-01-10 11:14:32,174 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] File Hash updated for resource: 95f1a88531a7e7f7c720477a4687c247
Express Load completed
2025-01-10 11:14:32,324 INFO [1d46372f-b904-4acd-80a6-11c173a57ddb] Express Load completed
2025-01-10 11:14:32,667 INFO [ckan.lib.jobs] Worker rq:worker:ddfa06f8ef2b44bcb71b6c2289541896 has finished job 1d46372f-b904-4acd-80a6-11c173a57ddb (xloader_submit: package: c8b178cb-c6d0-4391-9c5f-7f9378dac6a5 resource: 44af7805-d95e-4f49-a2d8-1f57279b6526) from queue "default"

@kowh-ai
Copy link
Collaborator

kowh-ai commented Jan 10, 2025

@riccardoricciagid - interesting, different Cannot decode JWT token error and I don't get the webassets.yaml is missing warning...

My problem is the api_key has been set to a value like 86a3c04e-7c17-4527-b0be-17d407073f28 which causes CKAN to complain about a Cannot decode JWT token: not enough segments error

Further investigation on this has this method returning a UUID format string get_xloader_user_apitoken()

@ThrawnCA
Copy link
Collaborator

ThrawnCA commented Jan 12, 2025

Have you generated an API token for a sysadmin account and then set it in ckanext.xloader.api_token? @kowh-ai That value looks like a UUID, which suggests it's an (obsolete) API key rather than token.

@riccardoricciagid
Copy link
Author

api_token rebuilt correctly. All run well but... while running xloader (for a CSV file), no jobs is listed:
(default) ckanuser@a60b22b6cf25:/usr/lib/ckan/default$ ckan -c /etc/ckan/default/ckan.ini xloader status
2025-01-13 10:30:53,282 INFO [ckan.cli] Using configuration file /etc/ckan/default/ckan.ini
2025-01-13 10:30:53,282 INFO [ckan.config.environment] Loading static files from public
2025-01-13 10:30:53,637 WARNI [ckan.common] Option lang is not declared
2025-01-13 10:30:53,677 WARNI [ckan.lib.webassets_tools] Cannot create library ckanext-xloader at /usr/lib/ckan/default/lib/python3.10/site-packages/ckanext/xloader/webassets because webassets.yaml is missing
2025-01-13 10:30:53,751 INFO [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
No jobs currently queued

(default) ckanuser@a60b22b6cf25:/usr/lib/ckan/default$ ckan -c /etc/ckan/default/ckan.ini jobs list
2025-01-13 10:31:12,149 INFO [ckan.cli] Using configuration file /etc/ckan/default/ckan.ini
2025-01-13 10:31:12,149 INFO [ckan.config.environment] Loading static files from public
2025-01-13 10:31:12,518 WARNI [ckan.common] Option lang is not declared
2025-01-13 10:31:12,557 WARNI [ckan.lib.webassets_tools] Cannot create library ckanext-xloader at /usr/lib/ckan/default/lib/python3.10/site-packages/ckanext/xloader/webassets because webassets.yaml is missing
2025-01-13 10:31:12,633 INFO [ckan.config.environment] Loading templates from /usr/lib/ckan/default/src/ckan/ckan/templates
There are no pending jobs.

And the worker ends corrctly:

2025-01-13 09:32:38,898 INFO [13798388-1a8f-41a8-b6f6-acfaba7a5118] ...copying done
Successfully pushed 343422 entries to "4f32bcae-77fd-40e2-a074-dad7d54f5581".
2025-01-13 09:32:38,930 INFO [13798388-1a8f-41a8-b6f6-acfaba7a5118] Successfully pushed 343422 entries to "4f32bcae-77fd-40e2-a074-dad7d54f5581".
Calculating record count (running ANALYZE on the table)
2025-01-13 09:32:38,947 INFO [13798388-1a8f-41a8-b6f6-acfaba7a5118] Calculating record count (running ANALYZE on the table)
Setting resource.datastore_active = True
2025-01-13 09:32:39,249 INFO [13798388-1a8f-41a8-b6f6-acfaba7a5118] Setting resource.datastore_active = True
Setting resource.datastore_contains_all_records_of_source_file = True
2025-01-13 09:32:39,314 INFO [13798388-1a8f-41a8-b6f6-acfaba7a5118] Setting resource.datastore_contains_all_records_of_source_file = True
Finished loading with tabulator
2025-01-13 09:32:40,459 INFO [13798388-1a8f-41a8-b6f6-acfaba7a5118] Finished loading with tabulator
2025-01-13 09:32:41,615 DEBUG [ckanext.activity.logic.action] Created 'changed package' activity
File Hash updated for resource: 2f5528035723f906ec46d294ef614ca0
2025-01-13 09:32:41,654 INFO [13798388-1a8f-41a8-b6f6-acfaba7a5118] File Hash updated for resource: 2f5528035723f906ec46d294ef614ca0
Express Load completed
2025-01-13 09:32:41,705 INFO [13798388-1a8f-41a8-b6f6-acfaba7a5118] Express Load completed
2025-01-13 09:32:41,815 INFO [ckan.lib.jobs] Worker rq:worker:2abe784494bf419db5fb0235b599d95f has finished job 13798388-1a8f-41a8-b6f6-acfaba7a5118 from queue "default"

@kowh-ai
Copy link
Collaborator

kowh-ai commented Jan 14, 2025

@ThrawnCA - thanks, yes I had a typo in my ckanext.xloader.api_token token name 🤭

@riccardoricciagid
Copy link
Author

What about the still pending??? Now I have no errors on JWT and token...

@kowh-ai
Copy link
Collaborator

kowh-ai commented Jan 14, 2025

@riccardoricciagid the still pending went away after fixing the typo for ckanext.xloader.api_token

@riccardoricciagid
Copy link
Author

@kowh-ai Yes. After corrections pending remain.
immagine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants