-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support batching before processing #172
Comments
Do you want to use a single connection or do you want to do a single query? |
A single query. Sorry! |
So we could make this work trivially because the processors already receive a batch of messages. The problem is how to introduce so without adding extra complexity to users. One possible idea is to introduce a |
That would work wonders for my use case! I agree with the added benefit of no new processes. I believe this should be a somewhat common case. I'll wait for your thoughts on API impact. I just can't have this fork going into production unfortunately. |
Can you try a PR so we at least see the impact in the codebase? Another
idea is to introduce a new callback, like “prepare_messages”, which is
called before handle_message. But having both feel redundant. At the same
time, if people have both handle_messages and handle_batch, I think they
will be confused about why we have both, as both will feel like batches.
--
*José Valimhttps://dashbit.co/ <https://dashbit.co/>*
|
All right! I'll try a PR. Can't commit with a schedule though. I'll try to look at it this week. Thanks! |
I've given some thought about how to name this. First I thought about
Implementation wise I was thinking of making the recursive private function handle_messages non-recursive, call this If this seems fine to you I'll start a PR. |
|
@josevalim isn't this a use-case for #39? If we got a mechanism to connect broadway pipelines like suggested on the above issue, this requirement can be done as something like a "filtering pipeline" that pushes messages forward conditionally, and the work is done on batchers . The second pipeline will just handle the happy path. I'm commenting here just as a mean to explore more what's been discussed so far on the topic of Broadway topologies. If that makes send, we can move the conversation there. Thanks in advance for your time! |
Not really. Adding new processes to the pipeline is expensive. We want to avoid doing as much as possible. Having different processing needs is not a reason for new processors. The only reason for new processes would be repartioning, which is not necessary when filtering. |
I have added a pull request showing the code changes required. |
Thanks @3duard0 and @josevalim ! I'll close this now that it has been merged. Any chance of this slipping though 0.6.2? Once again, thanks to you all! |
Thanks you all for the amazing work on broadway and its adapters. It is really incredible!
I have a feature request: the ability of running batchers BEFORE processors (and also after).
The use case is simple: auditing every message for idempotency checks. What I want is to run a check on a batch of messages and filter the ones I've already processed. Some adapters will publish the same message for several reasons: a restart of the messaging server, a re-partitioning of topics, a at-least once delivery (which might deliver many times) and so on.
Suppose I have an audit of messages on a DB. Each message check will run a query, which will need a connection and that might hit the performance very hard. So, I'd like to run an idempotency check on a batch of messages to use a single connection for batch. Then, I'd process each message individually.
To put it graphically:
Is this a "wanted" use-case?
Regards!
The text was updated successfully, but these errors were encountered: