Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions during setup.sh execution #69

Open
ivanbishop opened this issue Dec 16, 2024 · 14 comments
Open

Exceptions during setup.sh execution #69

ivanbishop opened this issue Dec 16, 2024 · 14 comments

Comments

@ivanbishop
Copy link

Using this image:
web:
image: ghcr.io/front-matter/invenio-rdm-starter:v12.0.10.1
pull_policy: if_not_present

Exceptions running
docker exec -it invenio-rdm-starter-worker-1 setup.sh

dec16th-setup.txt

@ivanbishop
Copy link
Author

Celery logging errors "real time"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/invenio/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 477, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/flask_celeryext/app.py", line 71, in call
return Task.call(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 760, in protected_call
return self.run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_rdm_records/fixtures/tasks.py", line 61, in create_vocabulary_record
service.create(system_identity, data)
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_records_resources/services/uow.py", line 377, in inner
uow.commit()
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_records_resources/services/uow.py", line 330, in commit
op.on_commit(self)
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_records_resources/services/uow.py", line 182, in on_commit
self._indexer.index(self._record, arguments=arguments)
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_indexer/api.py", line 179, in index
return self.client.index(
^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/client/utils.py", line 176, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/client/init.py", line 475, in index
return self.transport.perform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/transport.py", line 455, in perform_request
raise e
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/transport.py", line 416, in perform_request
status, headers_response, data = connection.perform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/connection/http_urllib3.py", line 308, in perform_request
self._raise_error(
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/connection/base.py", line 315, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
opensearchpy.exceptions.TransportError: TransportError(429, 'cluster_block_exception', 'index [invenio-rdm-vocabularies-vocabulary-v1.0.0] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];')
[2024-12-16 22:11:50,827: WARNING/ForkPoolWorker-5] PUT http://search:9200/invenio-rdm-vocabularies-vocabulary-v1.0.0/_doc/8fff9df4-d7ba-4f49-bf6b-54e68c0793cf?version=1&version_type=external_gte [status:429 request:0.002s]
[2024-12-16 22:11:50,835: WARNING/ForkPoolWorker-2] PUT http://search:9200/invenio-rdm-vocabularies-vocabulary-v1.0.0/_doc/4239ab4a-503f-44d2-ab90-337532b131f3?version=1&version_type=external_gte [status:429 request:0.002s]
[2024-12-16 22:11:50,836: ERROR/ForkPoolWorker-5] Task invenio_rdm_records.fixtures.tasks.create_vocabulary_record[8652219e-cd97-4048-b33c-c91604a3e010] raised unexpected: TransportError(429, 'cluster_block_exception', {'error': {'root_cause': [{'type': 'cluster_block_exception', 'reason': 'index [invenio-rdm-vocabularies-vocabulary-v1.0.0] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];'}], 'type': 'cluster_block_exception', 'reason': 'index [invenio-rdm-vocabularies-vocabulary-v1.0.0] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];'}, 'status': 429})
Traceback (most recent call last):
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_pidstore/models.py", line 212, in get
return cls.query.filter_by(**args).one()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2870, in one
return self._iter().one()
^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 1522, in one
return self._only_one_row(
^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 562, in _only_one_row
raise exc.NoResultFound(
sqlalchemy.exc.NoResultFound: No row was found when one was required

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_rdm_records/fixtures/tasks.py", line 57, in create_vocabulary_record
record = Vocabulary.pid.resolve(pid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_vocabularies/records/systemfields/pid.py", line 93, in resolve
pid, record = resolver.resolve(pid_value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_pidstore/resolver.py", line 52, in resolve
pid = PersistentIdentifier.get(self.pid_type, pid_value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_pidstore/models.py", line 214, in get
raise PIDDoesNotExistError(pid_type, pid_value)
invenio_pidstore.errors.PIDDoesNotExistError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/invenio/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 477, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/flask_celeryext/app.py", line 71, in call
return Task.call(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 760, in protected_call
return self.run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_rdm_records/fixtures/tasks.py", line 61, in create_vocabulary_record
service.create(system_identity, data)
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_records_resources/services/uow.py", line 377, in inner
uow.commit()
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_records_resources/services/uow.py", line 330, in commit
op.on_commit(self)
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_records_resources/services/uow.py", line 182, in on_commit
self._indexer.index(self._record, arguments=arguments)
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_indexer/api.py", line 179, in index
return self.client.index(
^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/client/utils.py", line 176, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/client/init.py", line 475, in index
return self.transport.perform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/transport.py", line 455, in perform_request
raise e
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/transport.py", line 416, in perform_request
status, headers_response, data = connection.perform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/connection/http_urllib3.py", line 308, in perform_request
self._raise_error(
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/connection/base.py", line 315, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
opensearchpy.exceptions.TransportError: TransportError(429, 'cluster_block_exception', 'index [invenio-rdm-vocabularies-vocabulary-v1.0.0] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];')
[2024-12-16 22:11:50,848: ERROR/ForkPoolWorker-2] Task invenio_rdm_records.fixtures.tasks.create_vocabulary_record[a7b7a297-2522-431f-b87b-42bc10307750] raised unexpected: TransportError(429, 'cluster_block_exception', {'error': {'root_cause': [{'type': 'cluster_block_exception', 'reason': 'index [invenio-rdm-vocabularies-vocabulary-v1.0.0] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];'}], 'type': 'cluster_block_exception', 'reason': 'index [invenio-rdm-vocabularies-vocabulary-v1.0.0] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];'}, 'status': 429})
Traceback (most recent call last):
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_pidstore/models.py", line 212, in get
return cls.query.filter_by(**args).one()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/sqlalchemy/orm/query.py", line 2870, in one
return self._iter().one()
^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 1522, in one
return self._only_one_row(
^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/sqlalchemy/engine/result.py", line 562, in _only_one_row
raise exc.NoResultFound(
sqlalchemy.exc.NoResultFound: No row was found when one was required

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_rdm_records/fixtures/tasks.py", line 57, in create_vocabulary_record
record = Vocabulary.pid.resolve(pid)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_vocabularies/records/systemfields/pid.py", line 93, in resolve
pid, record = resolver.resolve(pid_value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_pidstore/resolver.py", line 52, in resolve
pid = PersistentIdentifier.get(self.pid_type, pid_value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_pidstore/models.py", line 214, in get
raise PIDDoesNotExistError(pid_type, pid_value)
invenio_pidstore.errors.PIDDoesNotExistError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/invenio/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 477, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/flask_celeryext/app.py", line 71, in call
return Task.call(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 760, in protected_call
return self.run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_rdm_records/fixtures/tasks.py", line 61, in create_vocabulary_record
service.create(system_identity, data)
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_records_resources/services/uow.py", line 377, in inner
uow.commit()
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_records_resources/services/uow.py", line 330, in commit
op.on_commit(self)
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_records_resources/services/uow.py", line 182, in on_commit
self._indexer.index(self._record, arguments=arguments)
File "/opt/invenio/.venv/lib/python3.12/site-packages/invenio_indexer/api.py", line 179, in index
return self.client.index(
^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/client/utils.py", line 176, in _wrapped
return func(*args, params=params, headers=headers, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/client/init.py", line 475, in index
return self.transport.perform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/transport.py", line 455, in perform_request
raise e
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/transport.py", line 416, in perform_request
status, headers_response, data = connection.perform_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/connection/http_urllib3.py", line 308, in perform_request
self._raise_error(
File "/opt/invenio/.venv/lib/python3.12/site-packages/opensearchpy/connection/base.py", line 315, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
opensearchpy.exceptions.TransportError: TransportError(429, 'cluster_block_exception', 'index [invenio-rdm-vocabularies-vocabulary-v1.0.0] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];')
[2024-12-16 22:11:51,216: WARNING/ForkPoolWorker-4] /opt/invenio/.venv/lib/python3.12/site-packages/invenio_rdm_records/services/tasks.py:62: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
last_run = (datetime.utcnow() - timedelta(days=7)).isoformat()

[2024-12-16 22:11:51,216: WARNING/ForkPoolWorker-4] /opt/invenio/.venv/lib/python3.12/site-packages/invenio_rdm_records/services/tasks.py:63: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
reindex_start_time = datetime.utcnow().isoformat()

@mfenner
Copy link
Contributor

mfenner commented Jan 10, 2025

Ivan, I built image ghcr.io/front-matter/invenio-rdm-starter:v12.0.10.3 this week with some very minor updates of dependencies. The :latest tag is using the same image.

My .env file only contains settings for S3: INVENIO_S3_ENDPOINT_URL, INVENIO_S3_SECRET_ACCESS_KEY,
INVENIO_S3_ACCESS_KEY_ID, INVENIO_S3_BUCKET_NAME, using https://fly.storage.tigris.dev as INVENIO_S3_ENDPOINT_URL.

When I run the following with the default compose file in separate terminal windows:

docker compose up
docker exec -it invenio-rdm-starter-worker-1 setup.sh

I don't see any exceptions in setup.sh execution. This is with a clean install and no pre-existing db, web or worker containers. I need to create an admin account and can then upload files to s3.

The last few lines of the setup.sh output (where they start to differ from yours) are:

Creating indexes...
invenio-rdm-vocabularies  [####################################]  100%
Creating custom fields...
Created all custom fields!
No custom fields were configured. Exiting...
Creating required fixtures...
Created required fixtures!
Queues dict_keys(['stats-file-download', 'stats-record-view']) have been declared.
-- Setup completed --

@mfenner
Copy link
Contributor

mfenner commented Jan 10, 2025

Looking at your output, it seems that the errors are triggered when trying to create OpenSearch indexes. Is this on a fresh install? You can remove the OpenSearch indexes with docker exec -it invenio-rdm-starter-web-1 invenio index destroy --force --yes-i-know, but that sometimes doesn't work for me. As OpenSearch stores no data that have to be kept (with the exception of usage stats), it is always safe to delete the OpenSearch container.

@ivanbishop
Copy link
Author

It was bad indices Martin, thanks.

@ivanbishop
Copy link
Author

Still seeing validation errors using "latest" pull on clean install.

S3 issue still?

In my docker-compose I have
grep S3 docker-compose.yml

# Invenio-S3
   - INVENIO_S3_ENDPOINT_URL=${INVENIO_S3_ENDPOINT_URL:-}
   - INVENIO_S3_ACCESS_KEY_ID=${INVENIO_S3_ACCESS_KEY_ID:-}
   - INVENIO_S3_SECRET_ACCESS_KEY=${INVENIO_S3_SECRET_ACCESS_KEY:-}
   - INVENIO_S3_BUCKET_NAME=${INVENIO_S3_BUCKET_NAME:-}

This appears twice, once for environment:and then again in worker: section

In my invenio.cfg I have (unchanged from git pull)

APP_DEFAULT_SECURE_HEADERS = {
    "content_security_policy": {
        "default-src": [
            "'self'",
            "data:", # for fonts
            "'unsafe-inline'", # for inline scripts and styles
            "blob:", # for pdf preview
            "fly.storage.tigris.dev", # for S3 object storage
            "s3.us-east-1.amazonaws.com", # for S3 object storage
            "s3.eu-central-1.amazonaws.com", # for S3 object storage
            # Add your own policies here (e.g. analytics)
        ],
        "img-src": [
            "*",
        ]

AND

#S3_ENDPOINT_URL='http://localhost:9000/'
S3_ENDPOINT_URL='s3.eu-central-1.amazonaws.com'
S3_ACCESS_KEY_ID='AXXXXXXXXXXXXXXXX4S'
S3_SECRET_ACCESS_KEY='Cxxxxxyyyyyyyyyyyyyyyyyyyyyy'
S3_REGION_NAME='eu-central-1'
S3_BUCKET_NAME='rdm12'

My bucket rdm12 in exists in eu-central-1 and has the CORS headers set.

What works?
I have an admin user who

  1. creates a community OK, and sets it to "no review needed"
  2. I can drag a file onto the upoload area OK (it goes to 100%)
  3. I generate a DOI for it

What fails?
As soon as I say publish I get a "validation error"

Questions

  1. Are my S3 settings in the invenio.cfg (like .env?) looking OK? Escpecially the endpoint
  2. When I run "docker exec -it invenio-rdm-starter-worker-1 invenio files location list" I see "s3-default s3:// as default True" should it not say rdm12, my bucket?
  3. Can Tom M. Comment on these observations too?
  4. How did you test uploading files to S3 as "admin" ?
  5. I have full access to the AWS console and CLI for the hosting server.

@mfenner
Copy link
Contributor

mfenner commented Jan 20, 2025

@ivanbishop The s3 settings in the Docker Compose are duplicated because web and worker are different docker containers. The original docker compose uses multiple compose files where this is handled differently. And you also need a .env to provide these ENV vars, and I assume that is what you doing with the section below. That is what INVENIO_S3_ENDPOINT_URL=${INVENIO_S3_ENDPOINT_URL:-} provides (nothing after the hyphen means no default value. What is important that .env files and other ways of feeding env variables are prefixed with INVENIO_, in contrast to what is in invenio.cfg.

@mfenner
Copy link
Contributor

mfenner commented Jan 20, 2025

invenio files location create --default s3-default "s3://${INVENIO_S3_BUCKET_NAME}" stores the bucket name in postgres. Maybe this is related to the previous comment, i.e. your env variable is not prefixed with INVENIO_. You can check the uri field in the files_location postgres table (only 1 row).

@mfenner
Copy link
Contributor

mfenner commented Jan 20, 2025

How did you test uploading files to S3 as "admin" ?

I tested uploading with the new upload web UI action when logged in as user with admin rights.

@mfenner
Copy link
Contributor

mfenner commented Jan 20, 2025

In summary, I think that something is not working with passing ENV variables into the InvenioRDM instance. Check the files_location and files_files postgres tables.

@ivanbishop
Copy link
Author

ivanbishop commented Jan 21, 2025

Ah! Fool me... can u pls share your obfuscated .env file Martin. I totally blanked on config versus .env

Thanks for your patience
Ivan

@mfenner
Copy link
Contributor

mfenner commented Jan 21, 2025

The S3 part looks like this:

INVENIO_S3_ENDPOINT_URL=https://fly.storage.tigris.dev
INVENIO_S3_SECRET_ACCESS_KEY=*
INVENIO_S3_ACCESS_KEY_ID=*
INVENIO_S3_BUCKET_NAME=*
INVENIO_S3_REGION_NAME='us-east-1'

@mfenner
Copy link
Contributor

mfenner commented Jan 21, 2025

I updated the readme to include the S3 configuration: https://github.com/front-matter/invenio-rdm-starter. Did the same with the InvenioRDM Starter documentation (still updating): https://starter.front-matter.io/

@mfenner
Copy link
Contributor

mfenner commented Jan 29, 2025

@ivanbishop were you able to solve this?

@ivanbishop
Copy link
Author

Hello Martin, haven't got round to testing. Mea Culpa. I will this weekend. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants