-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contiguous Aggregate Performance #2841
Comments
There's a known performance issue when querying continuous aggregates which will be fixed in the next release (2.0.1). PR with the fix is here: #2828. This sounds like a duplicate of that. Can you try to disable real-time aggregation to see if that is a work-around for you? Should be possible with something like:
W.r.t. multiple refresh jobs on a single continuous aggregate; we don't support it right now, but might do it in the future. The same goes for parallel/multi-core support. |
Hi @zeroallox did you have a chance to try @erimatnor suggestion? |
Yes and it had no effect. I have even dropped the entire database and recreated everything from scratch, not implementing any sort of scheduling or automatic CAGG jobs. When i ran the job manually, the generation happened, albeit very slowly. I let it run for about a day, restarted the DB, and still had slow queries. Again running them manually (building the data set on the fly) was fine. (Excuse the late reply and thank you for your time!) |
I had a similar issue with CAGG performance - initial aggregation job was processing only ~5k rows/sec which is unacceptably slow and basically makes CAGG unusable for any realistic dataset (I have 1.7 billion rows in my time series), while creating a classic PostgreSQL materialized view took only ~15 minutes, and I have only 8 vCPUs, not 32 like the topic starter. |
Just upgraded to the new version today. Docker image name: Same issue :( I ran the CAGG job for about an hour, then deleted it. When i attempt to select even a single row the query takes over 20 mins. No other tasks are running, the DB is completely idle. |
Giving TSDB another whirl after a few months using v2.4.0 in Docker image There does not appear to be any improvement: The CAGG generation only utilizes a single core of a 32 core system. :( Has there been any updates or internal discussion about multi thread support for CAGG generation? |
There have been internal discussions about optimizations for CAGGs but there are no updates as of yet. It is likely that the upcoming changes for CAGGs will break compatibility and you should not expect linear scaling since the Continuous aggregate materialization process is not an embarrassingly parallel workload. |
Understood! What do you mean by "break compatibility"? Will one need to regenerate the CAGG MV's again? Or just that the user facing function calls / params are changing? NP either way, just want to keep an eye out. In any case, thank you so much for taking the time to get back to me. |
It would probably mean that there would be a new version of CAGGs, e.g. CAGG v2. You would probably be able to use v2 for new data, and you would need to migrate from v1 to v2 for existing CAGGs, to take advantage of new features, with the help of a tool. |
Would absolutely love to see performance improvements for the refresh job. For example say we have a refresh that covers 3 days, simple aggregate with a sum, count, avg, min, and max. If we run the refresh for that whole period, it'll take over 3.5 hours. But if we run 3 seperate refreshes, 1 for each day, we can complete the whole period in about an hour with each running for 20 minutes. This is unintuitive. |
Hi. I experiment with latest timescale db (docker 2.6) and I noticed slow performance on weather_metrics demo db. Laptop is good (icore7 and 16gb ram).
Querying hyper table is 47 times faster than continuous aggregation view, meanwhile aggregating table is 24 times smaller! |
materialized view created with
I think Aggregated view copied into regular table is 10 faster.
Turning that table into hypertable one slows down:
Regular index on copied aggregate table (without hyper one) makes the same query super fast:
|
Thanks for the detailed experiments! Performances of continuous aggregates have been improved in TimescaleDB 2.7. |
Results feel about the same as before. Running the raw query and doing the agg on the fly is many times faster than using the mat view. |
@zeroallox Can you post any of the results from one of the latest versions of TimescaleDB? (And please make sure you actually upgrade/migrate your cagg to the new format, if you aren't creating from scratch.) https://docs.timescale.com/timescaledb/latest/how-to-guides/continuous-aggregates/migrate/ |
I'm assuming then that CAGGs don't take advantage of the parallel worker refreshes of MVIEWs that PG14 introduced? https://www.postgresql.org/about/news/postgresql-14-beta-1-released-2213/ |
I'm getting increasingly concerned that all these 'slow' CAGG issues are related, somehow! Issue #6286 smells the same as the performance degradation raised here. I too have written alternatives to CAGGs using plain ol' aggregation jobs into hypertables rather than CAGGs and their performance is an order of magnitude better! And this is in late 2023 - a year since this ticket was last discussed. |
Hello @jvanns, We are working on improving CAggs with every version of TimescaleDB. However, we are aware that the current implementation has areas for improvement regarding performance for certain configurations and workloads. This issue is about using parallel workers for the refresh. #6286 looks more like a problem with missing chunk exclusion when no/certain order by values are used in the CAgg definition. We are currently working on improving the planning times for large real-time CAggs. Additionally, we will address other performance-related issues in the upcoming quarters. Best regards |
Relevant system information:
Hello,
We have been experiencing performance issues with contiguous aggregates during both initial generation and subsequent queries on the materialized view.
The CAGG was created with the following:
I would be more than happy to provide access to the current database or set up a new one for troubleshooting.
Thanks for your time!
The text was updated successfully, but these errors were encountered: