Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve vault client Stellar account fetching by using a locally-stored paging token #341

Open
ebma opened this issue Apr 19, 2023 · 7 comments
Labels
priority:low Do it some day

Comments

@ebma
Copy link
Member

ebma commented Apr 19, 2023

Problem statement

A Stellar account could potentially have thousands of transactions and we don't want to start fetching from the first transaction every time the vault client (re-)starts. But:
We cannot just save the paging token of the last transaction we fetched. Because, if the vault client suddenly crashes after storing the paging token of the last transaction but the last transaction (and the preceding ones) was/were not yet handled, (ie. they were not yet used to fulfill issue requests) the next time the vault goes online it would ignore them because it only fetches from the cached paging token.

Possible Solution

We could implement the following logic for storing a safe paging token, with safe meaning that we can ignore all the transactions that happened before the safe paging token (excluding that transaction itself):

  • We have a directory for 'caching' the paging token. Let's call it 'paging_tokens'
  • This directory will contain a list of files whose names will represent the paging token. We use a directory and not a single file for this to mitigate issues with parallel reads/writes of paging tokens (which would be problematic if we share a file between multiple processes)
  • Now, whenever we fetch the transactions of the vault's Stellar account and we see a transaction that we need to handle, we write a file with the paging token as the name and the content of that transaction to the 'paging_tokens' directory.
  • After we handled the transaction, ie. after we used it to complete an issue or redeem request and we are sure that we will not need that transaction again, we either
    • modify the content of the file and remove all its content (just keeping the paging token name), if the 'paging_tokens' directory only contains this item, or
    • remove the file for that paging token from the 'paging_tokens' directory again, if there are other files in the directory besides this one.
    • (We need to do this split approach because if we remove all files once all are handled our vault client would not know the latest paging token anymore)
  • When the vault client starts up it will now first check the 'paging_tokens' directory, load all 'cached' transactions (and handle them as if they were just fetched). Afterward, it starts fetching the Stellar account with the highest paging token as a cursor.

With this approach, we make sure that we don't miss the handling of any important transactions but reduce the load for the initial account fetch of our vault client.

Illustrating this with an example:

  1. The vault starts for the first time and fetches 100 transactions, starting from the very first transaction of its account.
  2. It realizes that the transactions 5, 21 and 50 are important for issue/redeem requests.
  3. It stores three files called 5, 21 and 50 in the 'paging_token' directory with the file content being the xdr encoded data of each transaction.
  4. Now, before it was possible to use the transactions, the vault client suddenly crashes due to whatever reason.
  5. The vault client starts again, checks the 'paging_token' directory, reads the three contained transactions into memory (because it wants to use them for executing the requests again), and then fetches the Stellar account's transactions (ascending) with 50 as the cursor.
@ebma ebma added the priority:low Do it some day label Apr 19, 2023
@vadaynujra
Copy link

@ebma does this need to be completed before the Pendulum launch of Spacewalk?

@ebma
Copy link
Member Author

ebma commented Apr 20, 2023

No, this is low priority and just a nice-to-have feature. It is not relevant for any milestone (and probably never will be) which is why I didn't add it to any.

@prayagd
Copy link
Collaborator

prayagd commented May 3, 2023

Hey team! Please add your planning poker estimate with Zenhub @b-yap @ebma @TorstenStueber

@TorstenStueber
Copy link
Contributor

@ebma just to understand what accounts a vault client needs to watch: I assume it is only the vault account and the accounts that intend to do an issue? If the latter: is there not a way to query all transactions that affect an account (the vault account) instead of only the accounts where a particular account is the source account?

Either way: I think that the solution works nicely. However, when manually storing data on a hard drive instead of in a database, we need to be aware of all kind of race conditions: in your example it could happen that the client wants to store the files 5, 21 and 50 (simultaneously) but it stores 50 first and then crashes. It would never process 5 and 21 anymore, even after a restart.

One could for example change your protocol so that the client only stores file 5, then processes it, then when completely done goes over to 21, etc. Slower, but safer.

@ebma
Copy link
Member Author

ebma commented May 9, 2023

just to understand what accounts a vault client needs to watch [...]

The vault only needs to watch its own Stellar account. It needs to see all the payment operations directed at it and can do so by fetching its transactions. In theory, it would be enough to only query all transactions that have the public key of the vault's Stellar account as the destination but I'm not sure if we can do that.

in your example it could happen that the client wants to store the files 5, 21 and 50 (simultaneously) but it stores 50 first and then crashes. It would never process 5 and 21 anymore, even after a restart.

Nice catch, we should probably change it as you proposed and make it more sequential.

@TorstenStueber
Copy link
Contributor

We need to check whether the "retrieve operations by account" or "retrieve transactions by account" endpoints retrieve only transactions/operations where the specified account is the source account or also the affected account. Unfortunately the Stellar docs don't specify this.

The experiments I made indicate that this is the case. I asked on discord for clarification.

@prayagd
Copy link
Collaborator

prayagd commented Dec 7, 2023

Moving it to icebox

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority:low Do it some day
Projects
None yet
Development

No branches or pull requests

4 participants