Remove separate entity counting step at end of copy #5819

lutter · 2025-02-12T19:45:34Z

We used to count all entities in the destination after all data had been copied; for large subgraphs, that can be a very slow operation, and in some cases makes the copy operation fail because the underlying fdw connection times out.

With this PR, we now update the entity count with every batch we copy.

With recent changes, BatchCopy has become unnecessary

We used to count entities after all the data had been copied, but that is a very slow operation for large subgraphs; that in turn can lead to bad side-effects like connection timeouts and the copy ultimately failing. We now keep track of the number of current entity versions that we copy with each batch and calculate the entity count incrementally

We used to count entities, but we know the result is 0 because all the tables we are counting are empty

mangas · 2025-02-27T09:49:30Z

store/postgres/src/relational_queries.rs

+    fn walk_ast<'b>(&'b self, mut out: AstPass<'_, 'b, Pg>) -> QueryResult<()> {
+        // Generate a query
+        // with copy_cte as ( {copy} )
+        // select count(*) from copy_cte where {block_range_current}


I think it would better to have a test for this

There are tests in store/test-store/tests/postgres/graft.rs that run these queries by copying/grafting test subgraphs

mangas · 2025-02-27T10:01:56Z

store/postgres/src/copy.rs

+
+        let count = count.unwrap_or(0);
+
+        deployment::update_entity_count(conn, &self.dst_site, count)?;


I'm not sure if I understand this correctly, I thought that count_current would run the count for the batch with the goal of adding it to previous result. Otherwise it would be counting all at the end as it was previously, right?

The call to count_current wraps the CopyEntityBatchQuery so that the overall query returns how many current versions were copied, i.e., we end up running something like

with copy_cte(current) as ( insert into dst(...) select * from src where ... returning block_range @> int32::MAX) select count(*) from copy_cte where current

For the select, copy_cte is a table that has one boolean column, which is true if the row we just inserted is a current version.

The query does two things at once: (1) it copies a batch (the inner insert statement) and (2) returns how many rows that were copied are considered 'current', i.e. entity versions that should be counted for the entity count of the deployment. Line 488 then takes that count and adds it to the entity count for the deployment.

With this change, we never run a count query just by itself.

lutter added 4 commits February 12, 2025 10:16

store: Fold BatchCopy into TableState

1dc3bd7

With recent changes, BatchCopy has become unnecessary

store: Clear the entity count when truncating a deployment

429100c

We used to count entities, but we know the result is 0 because all the tables we are counting are empty

store: Remove unused Layout.count_query

eaac6d6

fordN requested review from zorancv and mangas and removed request for zorancv February 13, 2025 17:01

mangas reviewed Feb 27, 2025

View reviewed changes

mangas approved these changes Feb 27, 2025

View reviewed changes

mangas reviewed Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove separate entity counting step at end of copy #5819

Remove separate entity counting step at end of copy #5819

lutter commented Feb 12, 2025

mangas Feb 27, 2025

lutter Feb 27, 2025

mangas Feb 27, 2025 •

edited

Loading

lutter Feb 27, 2025


		let count = count.unwrap_or(0);

		deployment::update_entity_count(conn, &self.dst_site, count)?;

Remove separate entity counting step at end of copy #5819

Are you sure you want to change the base?

Remove separate entity counting step at end of copy #5819

Conversation

lutter commented Feb 12, 2025

mangas Feb 27, 2025

Choose a reason for hiding this comment

lutter Feb 27, 2025

Choose a reason for hiding this comment

mangas Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

lutter Feb 27, 2025

Choose a reason for hiding this comment

mangas Feb 27, 2025 •

edited

Loading