Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(forknet): fix account shard assignment for forknet v2 #12837

Merged
merged 18 commits into from
Feb 14, 2025

Conversation

marcelo-gonzalez
Copy link
Contributor

the fork-network amend-validators command changes the accounts and keys for use with the mirror tx sending tool. This will change all the implicit accounts so that we can control them in the forked chain. But after this mapping, the account might not be in the same shard as it was before, in which case it's left in the trie for the wrong shard.

Here we fix it by keeping a list of updates for every shard in each StorageMutator, and then writing updates to the right shard's trie when commit() is called. To synchronize this across the threads that are each iterating over different shards, we put the ongoing state roots in a Mutex that we lock whenever we want to commit changes to the shard.

We also introduce a new type DelayedReceiptTracker that will remember the mapping from (source shard, receipt index) to target shard, and then after all threads have finished iterating over the whole trie in their shards, we write all delayed receipts in one thread with write_delayed_receipts()

One other small change we make here is passing the same height to MemTries::delete_until_height() as we did to the previous call to apply_memtrie_changes(), because the garbage collection condition in that function gets rid of anything below the specified height. So with the previous code, we were keeping one extra update's worth of unnecessary nodes

When iterating over a whole mainnet trie, this might actually
be pretty unnecessary
We already have the key we want to remove, so constructing the same
trie key and then serializing it is not necessary. Also, this was
incorrect for delayed receipts, where the indices in the source DB
probably did not start from zero, so we were deleting the wrong trie key
writing the index in the state record won't work if the shard layout
isn't the same as it was in the source chain
@marcelo-gonzalez marcelo-gonzalez requested a review from a team as a code owner January 29, 2025 22:08
@marcelo-gonzalez
Copy link
Contributor Author

this PR is on top of the same branch as #12798, so it's not quite as big as it looks. Will look to merge this one after that one is merged, but opening it now since I think it should be good to go

@marcelo-gonzalez
Copy link
Contributor Author

ok other one is merged, so the diff is just from this change now

Copy link

codecov bot commented Jan 30, 2025

Codecov Report

Attention: Patch coverage is 0% with 418 lines in your changes missing coverage. Please review.

Project coverage is 70.30%. Comparing base (6cd375e) to head (3d38e5a).

Files with missing lines Patch % Lines
tools/fork-network/src/cli.rs 0.00% 157 Missing ⚠️
tools/fork-network/src/storage_mutator.rs 0.00% 152 Missing ⚠️
tools/fork-network/src/delayed_receipts.rs 0.00% 109 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12837      +/-   ##
==========================================
- Coverage   70.39%   70.30%   -0.09%     
==========================================
  Files         848      848              
  Lines      174091   174287     +196     
  Branches   174091   174287     +196     
==========================================
- Hits       122548   122536      -12     
- Misses      46296    46501     +205     
- Partials     5247     5250       +3     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (-0.01%) ⬇️
genesis-check 1.41% <0.00%> (-0.01%) ⬇️
linux 69.98% <0.00%> (-0.08%) ⬇️
linux-nightly 69.95% <0.00%> (-0.09%) ⬇️
pytests 1.70% <0.00%> (-0.01%) ⬇️
sanity-checks 1.52% <0.00%> (-0.01%) ⬇️
unittests 70.13% <0.00%> (-0.09%) ⬇️
upgradability 0.20% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some design questions.

Comment on lines +652 to +653
let new_shard_id = shard_layout.account_id_to_shard_id(&new_account_id);
let new_shard_idx = shard_layout.get_shard_index(new_shard_id).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why in some places we check if account is NearImplicitAccount and in others we don't. Could be great to deduplicate logic or comment in which cases we need to recompute shard id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would all work even if you removed those checks, but the reason for them is that map_account() only changes implicit accounts, and for every other account it just clones it. So the if account is NearImplicitAccount { } just allows us to skip the unnecessary clone.

In this case we don't check it because we know we have to map the key and rewrite the value in the trie anyway, so I guess we might as well just omit the account type check.

But in any case I think this shows that map_account() should just return a Cow so we dont have to worry about calling it on every single key in the trie. will add a TODO to do that

Comment on lines 198 to 199
/// The fake block height is used to allow memtries to garbage collect.
/// Otherwise it would take significantly more memory holding old nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not actual anymore

{
let source_shard_id = shard_layout.get_shard_id(source_shard_idx).unwrap();

remove_source_receipt_index(&mut trie_updates[source_shard_idx], index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you may need to remove receipt from trie_updates? Looks like the logic is to take all receipts from trackers and commit them to storage, so the trie_updates is append only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for something like this case:

Suppose there are these delayed receipts (receipt_index, target_shard -> ($receipt_index: $target_shard)):

source shard 0: [(0: 0), (1: 1)]
source shard 1: [(0: 1)]
source shard 2: [(0: 0)]

Then suppose we iterate over the ones in source shard 0 first. Then for the first one (0: 0), this remove_source_receipt_index() is kind of unnecessary but won't really do anything bad. But then when we process the next one (1: 1), we add it as a receipt with index 0 in shard 1, and now there is no longer a receipt with index 1 in shard 0. Then we process shard 1 receipts and then shard 2 receipts. When we get to shard 2, we add that one receipt to shard 0, and it will have index 1, which was the one we deleted when processing shard 0 receipts. But what if that shard 2 receipt was actually (0: 2)? Then there would be no receipt with index 1 in shard 0 at the end, and we would want to just delete that index from the trie

Comment on lines +18 to +19
// Keeps track of the mapping of delayed receipts from a source shard to the
// shards their receivers will belong to in the forked chain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth mentioning that we don't store Receipts directly due to RAM concerns. At least that's my impression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yea that was why. Idk how important it will be in practice but I guess at least in theory it could get big. Actually now that I think about it, I should add a todo in write_delayed_receipts() because in that function we are storing everything in memory anyway :/ we should commit them periodically

Comment on lines 18 to 19

struct InProgressRoot {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment what it is for. Maybe part of comment for ShardUpdateState::new can go here.

@marcelo-gonzalez marcelo-gonzalez added this pull request to the merge queue Feb 14, 2025
Merged via the queue into near:master with commit 557bff1 Feb 14, 2025
24 checks passed
@marcelo-gonzalez marcelo-gonzalez deleted the fork-net-shards branch February 14, 2025 17:02
marcelo-gonzalez added a commit to marcelo-gonzalez/nearcore that referenced this pull request Feb 14, 2025
I forgot to "git add" this when applying suggestions from near#12837
github-merge-queue bot pushed a commit that referenced this pull request Feb 14, 2025
I forgot to "git add" this when applying suggestions from
#12837
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants