-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: chained pipelines #58
Comments
Hi @towhans, can you please expand on the use case? Generally speaking you don't want to pass the data through multiple processes, as that incurs copying. So our concern with "connecting pipelines" is that users will end-up using pipelines for code organization purposes instead of modelling runtime concerns. So can you describe why would you need to pass the data around? Thanks! |
The case is that
then we can't specify different parallelization for transformer2. So the case is about interleaving stateful and stateless transformations. |
Which kind of transformations though? What is stateful and what isn't? In theory, the only benefit for creating new pipelines / new stages is if different part of those stages depend on different IO resources and we plan to do it as part of #39. Stateful or stateless should not matter. :) |
Sorry for taking so long to respond. I had to think it through again. I get your point to avoid the anti-pattern of using gen_stages for code organization. In our case |
Thanks for following up! The unnecessary creation of processes/stages is exactly what we want to avoid, so when we adding multiple processors, we have to be really careful in documenting those concerns! |
I have a usecase for this, I think. Stream of user ids -> batchLookup profiles for users -> partition / filter profiles -> do somehting with batches of profiles. I can sort of make this work by moving the profile lookup into the producer, but then I need to build out that convenient batching logic myself instead. |
Batcher would be able to forward messages to another pipeline.
acks
would be performed by the very last batcher.The case is to have smaller pipelines that can be put together without the need to have an intermediate step of some external queue/topic.
The text was updated successfully, but these errors were encountered: