Top N Aggregates #40
davidkohn88
started this conversation in
Feature Discussion
Replies: 3 comments 2 replies
-
What would the best representation of this be? Would we do compression on the data in the aggregate? |
Beta Was this translation helpful? Give feedback.
2 replies
-
What would the API look like? I assume we would unnest the data? Would we only support text values (and just have people cast ints or whatever else to that?) and would you unnest the results? Could you downsample and just get the top10 from a top100 agg? |
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Original issue in #38:
In situations where one has zipfian or other long tail distributions of items, one often wants an aggregate that computes the topN most frequent values and lops off the rest for storage and lookup efficiency reasons. Take the case where you're monitoring traffic by URL and want to show the top 100 URLs by traffic in a day, there might be millions of URLs that are hit, but most of them only have a few hits and what you care about are the top 100. An aggregate that stored the top 100 urls and their counts on a daily basis and then you could re aggregate across multiple days if needed (though it would be an approximation of the true top100, it would be relatively accurate). I think this could be significantly more useful than the count min sketch in #6 .
Beta Was this translation helpful? Give feedback.
All reactions