Proposal: chained pipelines #58

towhans · 2019-03-20T14:21:10Z

Batcher would be able to forward messages to another pipeline. acks would be performed by the very last batcher.

The case is to have smaller pipelines that can be put together without the need to have an intermediate step of some external queue/topic.

The text was updated successfully, but these errors were encountered:

josevalim · 2019-03-20T14:37:51Z

Hi @towhans, can you please expand on the use case? Generally speaking you don't want to pass the data through multiple processes, as that incurs copying. So our concern with "connecting pipelines" is that users will end-up using pipelines for code organization purposes instead of modelling runtime concerns.

So can you describe why would you need to pass the data around? Thanks!

towhans · 2019-03-21T07:42:53Z

transformer1 -> processor1 -> batcher1
transformer2 -> processor2 -> batcher2

The case is that transformer2 is to be applied after processor1. processor1 is statefull. transformer2 is stateless. If we make:

transformer1 -> processor1 |> transformer2 |> processor2 -> batcher

then we can't specify different parallelization for transformer2.

So the case is about interleaving stateful and stateless transformations.

josevalim · 2019-03-21T08:05:54Z

So the case is about interleaving stateful and stateless transformations.

Which kind of transformations though? What is stateful and what isn't?

In theory, the only benefit for creating new pipelines / new stages is if different part of those stages depend on different IO resources and we plan to do it as part of #39. Stateful or stateless should not matter. :)

towhans · 2019-03-28T07:02:04Z

Sorry for taking so long to respond. I had to think it through again. I get your point to avoid the anti-pattern of using gen_stages for code organization. In our case transformators are stateless and processors are statefull. But that doesn't really matter. The importatnt realization for me is that the "chain of pipelines" is a higher level thing that can be assembled into one single broadway pipeline. So I retract the proposal and thank you for your replies. They were very helpful.

josevalim · 2019-03-28T08:17:40Z

Thanks for following up! The unnecessary creation of processes/stages is exactly what we want to avoid, so when we adding multiple processors, we have to be really careful in documenting those concerns!

kwando · 2019-07-05T08:59:12Z

I have a usecase for this, I think.

Stream of user ids -> batchLookup profiles for users -> partition / filter profiles -> do somehting with batches of profiles.

I can sort of make this work by moving the profile lookup into the producer, but then I need to build out that convenient batching logic myself instead.

msaraiva · 2019-07-05T11:32:59Z

Hi @kwando!

Thanks for the feedback.

I believe you'll be able to achieve that after we implement #39.

josevalim closed this as completed Mar 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: chained pipelines #58

Proposal: chained pipelines #58

towhans commented Mar 20, 2019

josevalim commented Mar 20, 2019

towhans commented Mar 21, 2019

josevalim commented Mar 21, 2019

towhans commented Mar 28, 2019

josevalim commented Mar 28, 2019 •

edited

Loading

kwando commented Jul 5, 2019

msaraiva commented Jul 5, 2019

Proposal: chained pipelines #58

Proposal: chained pipelines #58

Comments

towhans commented Mar 20, 2019

josevalim commented Mar 20, 2019

towhans commented Mar 21, 2019

josevalim commented Mar 21, 2019

towhans commented Mar 28, 2019

josevalim commented Mar 28, 2019 • edited Loading

kwando commented Jul 5, 2019

msaraiva commented Jul 5, 2019

josevalim commented Mar 28, 2019 •

edited

Loading