Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the context mechanism in DynamicPPL #2424

Open
Tracked by #2420
mhauru opened this issue Dec 5, 2024 · 6 comments
Open
Tracked by #2420

Simplify the context mechanism in DynamicPPL #2424

mhauru opened this issue Dec 5, 2024 · 6 comments
Labels
Milestone

Comments

@mhauru
Copy link
Member

mhauru commented Dec 5, 2024

(Writeup by @penelopeysm, but the ideas here were jointly discussed by @willtebbutt, @mhauru, @sunxd3 and I)

Background

Models on their own specify a set of variables and their relationship. However, there are many different ways a model can be evaluated, each with different outcomes. For example:

  • one might want to only accumulate logp of observations, thus giving the value of the (log) likelihood
  • one might want to record extra information about the variables seen in the model, e.g. to check whether a variable is specified twice
  • one might want to use existing values stored in a varinfo, or sample new values
  • ...

However, at the core, we still want to have a single model evaluation function; we don't want to write separate functions for each of these options.

The way this is accomplished is through the use of evaluation contexts. Specifically, DynamicPPL.evaluate!! takes an extra context argument which can contain any sort of information it wants, and is passed through the tilde pipeline all the way down to assume and observe. Thus, contexts may implement their own assume(..., ::MyContext) methods to inject custom behaviour that occurs when a model is evaluated.

Each context is its own type (which subtypes AbstractContext), and contexts are modelled as a linked lis. There are three 'leaf contexts', DefaultContext, LikelihoodContext, and PriorContext. 'Parent contexts' typically contain some kind of information, plus a child context.

diagram of linked list of contexts

There is, apparently, something very complicated about this, and we'd like to simplify it.

Possible directions

(1) Rework data structure

One option is to use a new data structure for a series of contexts, as shown below

  • (a) Extract the modifiers into a separate part
    Instead of having the linked list above, one option would be to have a base context plus an array of 'context modifiers':
    diagram of array of modifiers plus base context

  • (b) Flatten the modifiers into a single 'super-context' that can do everything
    diagram of flattened supercontext

In both cases, this would mean that instead of overloading assume and observe for each type of context, we would only have one method (or maybe N methods, where N is the number of leaf contexts). Inside that method though, we would have to either iterate over the list of modifiers, or dynamically perform some action depending on whether the modifier field in the super-context is nothing.

There is an additional danger in (b) as well, in that it does not preserve information about the order in which the modifications should be applied. That is to say, it assumes that modifiers are commutative. I don't know of a particular case where this isn't true, but I can imagine that it might happen with e.g. some combination of PrefixContext + ConditionContext.

In my (Penny's) opinion, doing this is pretty much just shuffling the complexity around and won't really give us substantial code simplification. It would also invalidate the type-based dispatch system that we now have, so I also worry about whether it would cause performance regressions.

(2) Place more type bounds on the contexts

Anohter option is to retain the linked list structure, but to place more constraints on what types of contexts can appear in which situation. For example, there are a number of contexts that are used solely in the implementation of a single function: see e.g. ValuesAsInModelContext and DynamicPPL.values_as_in_model. These contexts have no business being at anywhere but the head of the linked list (i.e. the top of the context stack), and we could enforce this by having something like this:

abstract type CanBeChildContext <: AbstractContext

struct DefaultContext <: CanBeChildContext end

# Note that we don't subtype CanBeChildContext
struct ValuesAsInModelContext <: AbstractContext
    values::T
    childcontext::CanBeChildContext
end

This makes it illegal to, for example, construct certain combinations such as a ValuesAsInModelContext{ValuesAsInModelContext{DefaultContext}}}.

(Markus had ideas about extending this trait-like system even more, so e.g. specific traits would allow contexts to perform specific actions such as modifying VarInfo. My memory of this is a little bit hazy, so I will ask him to write this)

My (Penny's) opinion is basically that I like the general idea, but I'm not yet convinced this will help us very much. I think it's really difficult to group contexts according to 'shared behaviour' / 'these contexts have equal levels of permissions'. And although the 'can be child context' one might seem like an obvious low-hanging fruit, I don't think it makes for very much improvement in the code because ValuesAsInModelContext is already contained to a single source file, and thus very unlikely to 'leak' out of it and cause weird contexts like SamplingContext{ValuesAsInModelContext}.

The other concern with this is that they're basically 'mixin' traits, which work nicely in something like Python because you can inherit from as many classes as you like, but in Julia the fact that each type must have only one supertype leads to difficulties when you want to inherit multiple types of behaviour. For example:

# Can do Foo
abstract type CanFooContext <: AbstractContext end

# Can do Bar
abstract type CanBarContext <: AbstractContext end

# Can do Foo and Bar
# You can plug this anywhere you need a CanFooContext, but not a CanBarContext?
abstract type CanFooBarContext <: CanFooContext end

What now?

The thing about the whole context thing is that I don't entirely know how it can be simplified without re-introducing the complexity in some other format. The purpose of having different contexts is because it allows us to abstract away the notion of 'model evaluation' into the tilde pipeline with assume / observe, but if we want to support different evaluation modes, then the extra information needed for this has to be supplied in some format.

Personally, I did find the context mechanism very confusing at the start, but after having worked with it a bit I think it is actually quite natural. (And as a Haskeller: it has a direct resemblance to monad transformer stacks for handling effects.) And my personal intuition would be that the best thing to do is to write more extensive documentation on this. We have a nice base of the minituring page to work with: https://turinglang.org/docs/developers/compiler/minituring-contexts/ which explains the 'need' for contexts very well, and I think it would be great to (a) expand on this need, by showing the model evaluation function, and (b) explain how it's used inside DynamicPPL itself, and (c) demonstrate how to extend this system with a custom context.

However, I'm conscious that maybe there are other perspectives that I'm not seeing, so if someone else has ideas on how contexts can be simplified, please do post below!

Related issues

@willtebbutt
Copy link
Member

willtebbutt commented Jan 31, 2025

Note from chat:

  1. remove .~ before tackling this,
  2. ask @torfjelde if he has any good ideas for simplification.

@penelopeysm to type up different possibilities from today's discussion.

@penelopeysm
Copy link
Member

will expand, but

  1. Flattening contexts into one big struct with optional fields
  2. Flattening contexts into an array of their effects
  3. Intermediate abstract context types with certain invariants

@torfjelde
Copy link
Member

I'll probably need a bit more context on the different possibliities 👀

@penelopeysm penelopeysm removed their assignment Feb 19, 2025
@penelopeysm
Copy link
Member

penelopeysm commented Feb 19, 2025

I've now (finally) written up the above. I'm clearing my assignment because my assessment of the situation is basically to do nothing to the code and write better docs, but if someone else would like to do some code changes, we can assign someone again.

@mhauru
Copy link
Member Author

mhauru commented Feb 20, 2025

(Markus had ideas about extending this trait-like system even more, so e.g. specific traits would allow contexts to perform specific actions such as modifying VarInfo. My memory of this is a little bit hazy, so I will ask him to write this)

I think the main source of complexity with contexts is that given that they can change absolutely anything about the behaviour of tilde_assume and tilde_observe, it is very hard to reason about what happens when you nest them in particular ways. If I make a new context that does magic operation X on the VarInfo and then calls the same method for the child context, is my magic operation X going to mess with how the child context does its job? Might the child context mess with what I'm trying to achieve with X? Might having X and whatever the child context does happen in sequence cause an error, or incorrect results? I have no way of answering any of these questions, because the child context can do absolutely anything.

A case study: The ESS sampler changes the leaf context used to evaluate logp to always be LikelihoodContext, even when we are sampling from e.g. the posterior, because something about how ESS works. This caused a subtle bug with the new Gibbs sampler, when it interacted with how the GibbsContext short-circuits the evaluation of tilde_assume for variables that are not being sampled by the current component sampler, causing the wrong leaf context to be used in some cases. The way Tor and I solved this was to make GibbsContext always insert itself right above the leaf context. This feels like an ugly hack, and if anyone ever has to write another context that for some reason needs to be right above the leaf, the two will clash with unpredictable consequences.

The idea with the traits/abstract subtypes would be to have less contexts that use the full power of the context mechanism to override anything about the behaviour of the tilde pipeline. I think we have many contexts that actually do something quite simple, and having them explicitly marked as being of some lesser tier of context, where they are for instance only allowed to run a postprocessing or preprocessing function before calling tilde_obssume on the child, might make it easier to reason about their behaviour. I'm quite uncertain whether this would actually solve much, but that's the idea that Penny was referring to above.

The thing about the whole context thing is that I don't entirely know how it can be simplified without re-introducing the complexity in some other format.

I think I agree with Penny here, that none of the proposals so far are much better than what we have. I would still like to keep trying to come up with better ideas though, because contexts are one of the most confusing aspects of how Turing works.

One way forward could be to first implement TuringLang/DynamicPPL.jl#744, which should allow us to get rid of the leaf contexts and maybe a few other contexts. Maybe that would simplify solving this one.

@mhauru
Copy link
Member Author

mhauru commented Feb 20, 2025

Maybe all I'm saying above is that we could try to use lighter structures and mechanisms, existing and new, to do much of the work that is currently done by contexts, and maybe after we've done that we would be left with only a small handful of contexts. It could then be easier to either redesign the context mechanism to meet the needs of that small handful, or to accept leaving that small handful of contexts as they are and just document them well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants