-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental CAgg Refresh Policy #7790
base: main
Are you sure you want to change the base?
Incremental CAgg Refresh Policy #7790
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7790 +/- ##
==========================================
+ Coverage 80.06% 81.88% +1.81%
==========================================
Files 190 247 +57
Lines 37181 45627 +8446
Branches 9450 11418 +1968
==========================================
+ Hits 29770 37361 +7591
- Misses 2997 3767 +770
- Partials 4414 4499 +85 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
81f49e3
to
b697dae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Since it is in draft, I will wait with approving until you have the final version.
4e90f1e
to
d976191
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions regarding some parts of the code where I am not sure if it is correct or not.
d865281
to
b139fa1
Compare
Nowadays a Continuous Aggregate refresh policy process everything only once independent of how large the refresh window is. For example if you have a hypertable with a huge amount of rows it can take a lot of time and requires a lot of resources in terms of CPU, Memory and I/O to refresh a CAgg, and all the aggregated data will be visible for the users only when the refresh policy complete it execution.
This PR add the capability of a CAgg refresh policy be executed incrementaly in "batches". Each "batch" is an individual transaction that will process a small fraction of the entire refresh window, and once the "batch" finishes the execution the data refreshed will already be visible for the users even before policy execution end.
To tweak and control the incremental refresh some new options was added to
add_continuous_aggregate_policy
API:buckets_per_batch
: number of buckets to be refreshed by a "batch". To summarize this value is multiplied by the CAgg bucket width to determine the size of the batch range. Default value is0
(zero) that means it will keep the current behavior of single batch execution. Values less than0
(zero) are not allowed.max_batches_per_execution
: maximum number of batches to be executed by a policy execution. This option is used to limit the number of batches processed by a single policy execution, so if some batches remain next time the policy run they will be processed. Default value is10
(ten) that means that each job execution will process the maximum of ten batches. Values less than0
(zero) are not allowed.