-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Scalability issue with multiple simultaneous DIDExchange requests #3492
Comments
Ah…this is the issue you thought was resolved with the last Askar updates. I don’t think the follow up Askar changes would have impacted this, but who knows. Just to clarify, if you do fewer than the 10 parallel tenants, things work fine. Its only once you get to a threshold of tenants that the problems occur, correct? |
Indeed - I had this debug script ready when askar 0.4.2 came out, and when I upgraded it in acapy and reran these tests, I got 2 successful runs and thought it's fixed (after previously consistent test failures). The stack traces show timing out on opening askar session, and that's why I thought caching changes were just the thing that might fix it, but I haven't done enough testing on each askar version to really verify impact of all the changes. It's probably sensitive to resource constraints. In our dev environment, it happens less often with a slightly beefier node.
Exactly - I tested 2, 3, 4, 5, 6 - all working. Wanted to find the exact boundary, but alas, impatience. With 10 consecutive requests, they usually all fail. I'll have to add some more debug logs around the place to figure out what's causing the issue. Quick inspection shows a lot of PS: We've recently begun open-sourcing acapy-cloud (previously some hidden helm charts were needed to deploy everything locally), and so we're partially curious to hear if others can succeed in setting up a local acapy-cloud environment. I think it should prove to be a very useful and powerful repo to simplify working with acapy, and debugging things like this. It can definitely benefit from more users and contributors! So please, to all maintainers here, check it out and feel free to let know if you need help 🚀 |
When multiple tenants simultaneously request a DIDExchange connection with an issuer's public DID, several unhandled exceptions are raised, causing all requested connections to fail.
The handling logic and auto-complete flows associated with the DIDExchange request do not report any error to the clients that made the request, leaving their connection record in the
request-sent
state.The issuer does not receive any
request-received
records as expected - not even one of the many requests.Note: this is running the latest ACA-Py release, with askar 0.4.3
Steps to Reproduce
There are many steps required to reproduce this in acapy alone... so the simplest way to reproduce this would be to check out our
acapy-cloud
repo (previouslyaries-cloudapi-python
), where a simple test script can do all the setup and replicate it for you: https://github.com/didx-xyz/acapy-cloudAs a summary - besides all the steps for onboarding an issuer, and registering their public DID - here's how to replicate this issue:
POST /didexchange/create-request
) usinguse_public_did
to set the issuer's public DID for the request.The above steps can be achieved:
app/tests/e2e/test_many_connections.py
mise run tilt:up
, and wait for services to be up and running (visit localhost:10350)pytest app/tests/e2e/test_many_connections.py
The test should fail with "Connection 0 failed with exception" and then "expected webhook not received".
Under Multitenant-Agent logs, you'll see many exceptions being raised, one for each request.
The stack trace seems to reveal that it's to do with a timeout waiting to open an askar session:
PS: Log levels can be modified in
helm/acapy-cloud/conf/local/multitenant-agent.yaml
, e.g. setACAPY_LOG_LEVEL
todebug
Please let me know if the replication steps are successful or not, or whether you need help with the acapy-cloud mise setup.
The text was updated successfully, but these errors were encountered: