Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State-aggregate #117

Closed
davidkohn88 opened this issue Apr 22, 2021 · 6 comments
Closed

State-aggregate #117

davidkohn88 opened this issue Apr 22, 2021 · 6 comments
Labels
feature-request And area of analysis that could be made easier

Comments

@davidkohn88
Copy link
Contributor

timescale/timescaledb#1324 describes an interesting and possibly common problem of devices switching between different (error/non-error etc) states. A state aggregate would add up the time spent in each state following the ordered list of things. This would be very difficult to put into continuous aggregates otherwise as it would need window functions and other bits to get right. In concept it looks reasonably similar to time_weighted_average though with a list of distinct items and their durations as well as the starting and ending values, it'd be nicely combinable and all that and would really help folks who want to answer the "how long was my device in an error state" type question (common uptime question).

@davidkohn88
Copy link
Contributor Author

Similar sort of problem that could be solved by the same agg: https://stackoverflow.com/questions/68544600/postgresql-looping-through-array-agg

@backbone87
Copy link

backbone87 commented Aug 9, 2021

i think time_weighted_average would almost work, if we use an expression for the value: time_weight('LOCF', ts, val = 'ERROR'). the only problem i see is to get the "initial state" of a time window which should be carried forward from the last value of the previous bucket (even if there is no previous bucket in the selection set). also there should be a way to truncate the weight at a timestamp ("now") to be able to aggregate over "open" buckets. also the real start of the series could be in the middle of a bucket without there being a previous value, so the bucket duration needs to be adjusted here too.

edit: there also needs to be a workaround for not being able to use time_bucket_gapfill for caggs

@bburns
Copy link

bburns commented Sep 20, 2021

We're trying to calculate machine uptime, downtime, and other metrics, and will need a function like this. Posted a question here - https://stackoverflow.com/questions/69246220/timescaledb-how-get-timestamp-differences-between-rows-and-group-results-into.

In the meantime, we'll write some nodejs code to dump data into a bins table for a viz lib (grafana) to consume. It'll be a bit complex, and have fixed bin size (an hour?), but I guess the alternative is to write some complex Postgres code.

Eventually we might move the code into a grafana (or superset) plugin - then bin sizes can be dynamic, and use an in-memory bin table, instead of needing to go through the db bins table intermediary.

Anyway, yes this would be a nice function to have!

@ravtoleakshay
Copy link

ravtoleakshay commented Sep 24, 2021

Hey folks,
I have a hypertable that stores machine alarms per second, I need something, where the different alarms would have their machine, alarm_message with generated_on and ended_on time ranges when each alarm occurred, for all the machines.

https://stackoverflow.com/questions/69300994/timescaledb-hypertable-is-there-any-way-to-aggregate-data-based-on-parameter-va

Currently, I am running query to get calculate it real-time (recent 10 alarms), I want a way to create aggregate continuously. (may be using triggers/procedures) so that I can have historical alarms per device with start/end time.

It would be nice to have some built-in function or automated way to achieve this.!

edit:

I tried creating a trigger, which will execute after insert and will record aggregates into another table named alarms_history,
At the moment, It only adds distinct machine,message into alarms_history but if a alarm comes again, it will skip(as ended_on is always null).

If I can, somehow, calculate the ended_on, and update the ended_on in alarms_history, then it might work as expected.

INSERT INTO alarms_history ( machine, message, generated_on)
SELECT NEW.machine, NEW.alarm_msg, NEW.time
WHERE NOT EXISTS (SELECT 1 FROM alarms_history WHERE machine = NEW.machine and message = NEW.alarm_msg AND ended_on IS NULL);```

Any inputs on this? 

@JLockerman
Copy link
Contributor

an example of this might look like

SELECT duration_in('STOPPED', states) as run_time, duration_in('ERROR', states) as error_time FROM (
  SELECT state_agg(time, status) as states FROM ...
);

which would return the amount of time a service spends stopped or in an error state.

@WireBaron
Copy link
Contributor

This issue should have been closed by PR #324

We released the state_agg as experimental in our 1.6.0 release, and just released an improvement to return PostgreSQL interval types from duration in 1.7.0. We'll create a new issue for stabilizing this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request And area of analysis that could be made easier
Projects
None yet
Development

No branches or pull requests

6 participants