[Proposal]: Expression tree evolution #4727

AndriySvyryd · 2021-05-07T04:44:30Z

AndriySvyryd
May 7, 2021
Collaborator

Expression tree evolution

Proposed
Prototype: ExpressionBuilder support, New expression trees
Implementation: Not Started
Specification: Not Started

Summary

This proposal provides a way to introduce changes to the expression trees generated by the compiler, while providing a reasonable backwards compatibility story for legacy LINQ providers and other consumers.

Motivation

The expression trees haven't had significant updates since introduction and lack many C# features that have now become commonplace. This creates a perception of staleness for customers and makes them doubt the future of expression trees.

This is a non-exhaustive list of C# features that are currently either unsupported or are more restrictive in compiler-generated expression trees:

Statement trees
Compound assignment expressions
Less restrictive unary and binary assignments
Increment and decrement expressions
Multi-dimensional array initializers
Named and optional parameters
base access
Dynamically bound operations
Async/await
Conditional access expressions
Interpolated Strings preserving original format string
Dictionary initializers
Out variable declarations
Ref struct literals
Default expression
Throw expressions
Discard expressions
Deconstructing assignment
Pattern matching
Local functions
Tuple literals
Tuple equality and inequality
Null coalescing assignment
Switch expressions
Using declarations
Indices and ranges
With expressions
Capturing by value

See ExpressionFutures for more details and implementation prototype.

Related discussions: #158, #2029, #2545

In addition to the above there are proposed LINQ features that would also benefit from the proposed pattern:

Detailed design

ExpressionTreeLangVersion

ExpressionTreeLangVersion is a compiler parameter (and csproj option) that determines which static methods on Expression are allowed to be called to create the expression tree. If the ExpressionTreeLangVersion isn’t specified, it defaults to the C# 9.0 expression trees.

If a language feature that isn’t supported by the configured version is used in the expression the compiler outputs an error message that presents the user with information on how to change the language version and advises them to check their LINQ provider’s or expression-processing library documentation for the latest supported expression tree version.

As long as all the used features are supported changing ExpressionTreeLangVersion will not affect the produced expression tree.

The default implementation of ExpressionVisitor throws for all new nodes.

Expression tree versioning pattern is independent of LangVersion: "1.0", "2.0", "3.0", etc. "1.0" corresponds to the expression tree features in C# 9.0

ExpressionBuilderAttribute

namespace System.Runtime.CompilerServices
{
    [AttributeUsage(AttributeTargets.Assembly | AttributeTargets.Class | AttributeTargets.Method)]
    public class ExpressionBuilderAttribute : Attribute
    {
        public ExpressionBuilderAttribute(Type type)
        {
            BuilderType = type;
        }

        public Type BuilderType { get; }
    }
}

When ExpressionBuilderAttribute is applied to methods with Expression<>-typed parameters the compiler will use the static methods on the type specified in the attribute to construct the expression trees passed as arguments.

For example, the following builder limits the expression trees to just a subset of three expression node types:

public static class LimitingExpressionBuilder
{
    public static ConstantExpression Constant(object value, Type type) => Expression.Constant(value, type);
    public static ParameterExpression Parameter(Type type, string name) => Expression.Parameter(type, name);
    public static BinaryExpression Add(Expression left, Expression right) => Expression.Add(left, right);
}

ExpressionBuilderAttribute can also be specified at the assembly level to provide the default for all contained methods. If no ExpressionBuilderAttribute is present the compiler will generate [ExpressionBuilder(typeof(ExpressionBuilderVX_Y))] at the assembly level where "X.Y" is the value of ExpressionTreeLangVersion used to compile the assembly.

Non-annotated methods are treated as annotated with [ExpressionBuilder(typeof(ExpressionBuilderV1_0))].

Expression contains methods for all the expression node types supported by the latest compiler. Therefore using [ExpressionBuilder(typeof(Expression))] means that a given method is designed to be forward-compatible with new expression node types.

When an ExpressionBuilderAttribute is found and BuilderType is of the form ExpressionBuilderVX_Y then the translation is performed as if the ExpressionTreeLangVersion was set to "X.Y". If BuilderType is ConfigurableExpressionBuilder then the user-supplied ExpressionTreeLangVersion value is used. For any other BuilderType ExpressionTreeLangVersion is disregarded and when a new feature is used that's not supported by the specified BuilderType then a standard error will be produced:

error CS0117: 'LimitingExpressionBuilder' does not contain a definition for 'Range'

All LINQ operator methods are annotated with [ExpressionBuilder(typeof(ConfigurableExpressionBuilder))] at the assembly level.

When an expression tree is assigned to a variable before being passed to the method annotated with ExpressionBuilderAttribute it must be explicitly typed as VersionedExpression<TBuilder, TExpression>for the compiler to use TBuilder when translating the expression.

public class VersionedExpression<TBuilder, TExpression>
    where TExpression : Expression
{
    public VersionedExpression(TExpression expression)
    {
        Expression = expression;
    }

    public TExpression Expression { get; }

    public static implicit operator TExpression(VersionedExpression<TBuilder, TExpression> expression)
        => expression.Expression;
}

For example:

VersionedExpression<LimitingExpressionBuilder, Expression<Func<int>>> expr = () => 1 + 2;
var _ = Evaluate(expr);

[ExpressionBuilder(typeof(LimitingExpressionBuilder))]
public abstract int Evaluate(Expression<Func<int>> expr);

If TBuilder is different from the type specified by ExpressionBuilderAttribute an analyzer warning is produced.

When using an Expression<>-typed parameter in a method call that is annotated with a different ExpressionBuilder an analyzer warning is also produced.

See Expression Types proposal for a description of how ExpressionBuilderAttribute can be used to produce an expression tree using an alternative implementation of the expression nodes.

Expression is annotated with [ExpressionBuilder(typeof(ConfigurableExpressionBuilder))]

Drawbacks

Dynamic expression handling

The above doesn’t attempt to protect against expression trees which are constructed dynamically (e.g. via Expression.Binary) – only trees constructed by the compiler.

Users can already dynamically construct a variety of incompatible expression trees and pass them into LINQ providers, with undefined effects. For example, Expression.Switch constructs a switch expression which the compiler never constructs and is unlikely to be supported by any LINQ providers.
Dynamic expression tree construction is an advanced use case, and the user is responsible for the tree correctness as well as for ensuring support with the consumer.

Similarly, layers such as Automapper or OData - which produce or transform expression trees - are also responsible for ensuring correctness and LINQ provider compatibility. In practice, since different providers support different tree shapes, layers such as OData already need to be aware of the LINQ provider being targeted, and tailor their output trees accordingly. The user would likely need to opt-in to a particular version by configuring a setting or installing a NuGet package.

Multiple LINQ providers

When multiple LINQ providers are used in the same application, the lowest supported language version must be selected to take advantage of compile-time checking.

This can be resolved by casting to VersionedExpression<,> with a specific version where necessary.

Alternative expression tree representations

Since the expression trees produced need to remain backward compatible some expressions will need be represented by a new expression type, even if semantically they could be grouped with an existing expression type. This would make the code that handles the expression trees more verbose. And this prevents from adding better representations of existing features, e.g. interpolated strings.

Alternatives

Let LINQ providers expose the supported expression version

The initial experience/discovery is not ideal – the user uses a modern construct and gets a compilation error; they need to figure out how to set up the expression version and which version is supported by their LINQ provider. And if a version is selected that isn’t supported by a LINQ provider the behavior is undefined (and could include data corruption).

The providers could offer a way to use the latest supported expression version in a way that could be consumed by the user project. The provider NuGet package can include a .targets file that adds a value to SupportedExpressionTreeLangVersions. If ExpressionTreeLangVersion is not specified then the lowest of SupportedExpressionTreeLangVersions is used if any.

However in the rare case when multiple LINQ providers are referenced and one of them hasn't been updated with SupportedExpressionTreeLangVersions support this can still lead to undefined behavior.

Default to the latest expression version for LINQ methods.

Another approach would be to default to the latest expression version supported by the compiler and introduce IQueryProvider.CheckSupport(Expression) that is called by every LINQ operator. The default implementation will throw for new node types.

Any custom queryable operators defined outside BCL would not have the CheckSupport() call until they are updated and the fail-fast protection wouldn’t apply. However since they don't have an ExpressionBuilder annotation they will be treated as annotated with [ExpressionBuilder(typeof(ExpressionBuilderV1_0))] and thus the compiler will prevent new feature usage

Pros:

Makes the legacy LINQ provider to behave in a predictable fail-fast way
The user doesn’t have to opt into anything or figure out which language version is supported by their LINQ provider. For updated providers the experience is great, everything just works out of the box.
Multiple LINQ providers co-exist well in the same application, each applying their own checks.

Cons:

Adds a performance penalty at run-time.

Runtime checks by the queryable operators

To reduce the performance penalty on legacy providers of the previous approach we could introduce a Version property to Expression and make it bubble up in the tree; for example, a binary node contains the maximum version bubbled up from its left/right sub-nodes.

Pros:

Same as above

Cons:

Expression trees become slightly heavier perf-wise because of the new property and bubbling behavior. This also affects other scenarios with expression trees which aren’t concerned with versioning (e.g. code generation).
Custom expressions would need to be updated to handle children versioning

Mirror C# versioning

ExpressionTreeLangVersion could accept the same values as LangVersion, corresponding to the features introduced in each C# version.

However adding "1" and "2" would mean unnecessarily restricting the available features and starting at "3" for the current feature-set would be confusing as the expression trees in C# 9.0 don't support all C# 3.0 features.

Using an independent pattern provides more flexibility to the order the expression tree features would be introduced.

Introduce a new expression node hierarchy

Switching to a new representation that's aligned with the internal Roslyn tree will make implementing new C# features less costly and having a consistent representation will make the implementation of non-LINQ consumers easier.

However LINQ providers would still need to have backward-compatibility, resulting in considerable maintenance penalty. And having two or more ways of representing an operation in the expression tree could lead to confusion on both the user and consumer sides.

Use pre-processor directives

VersionedExpression<,> can be replaced by pre-processor directive like #expressionTreeLangVersion 2.0

However this doesn't allow to use an expression builder that produces a different type of expression nodes.

Unresolved questions

Design meetings

Answered by ajcvickers

Nov 22, 2022

My understanding is that this work has so far not met the bar for investment. That could change, but I don't expect it to any time soon. I think we are back to the status quo from the last few years that System.Expressions is what it is.

View full answer

bernd5 · 2021-05-07T07:43:20Z

bernd5
May 7, 2021

If you look at the current implementation of expression trees you can see that the syntax-tree is not just transformed into an equilant expression tree.
It is actually tranformed with ensuring csharp language rule semantics. If you write for example:

Expression<Func<int, short, int>> f = (a, b) => a + b;

the generated code is:

ParameterExpression parameterExpression = Expression.Parameter(typeof(int), "a");
ParameterExpression parameterExpression2 = Expression.Parameter(typeof(short), "b");
BinaryExpression body = Expression.Add(parameterExpression, Expression.Convert(parameterExpression2, typeof(int)));
ParameterExpression[] array = new ParameterExpression[2];
array[0] = parameterExpression;
array[1] = parameterExpression2;
Expression.Lambda<Func<int, short, int>>(body, array)

instead of just:

ParameterExpression parameterExpression = Expression.Parameter(typeof(int), "a");
ParameterExpression parameterExpression2 = Expression.Parameter(typeof(short), "b");
BinaryExpression body = Expression.Add(parameterExpression, parameterExpression2);
ParameterExpression[] array = new ParameterExpression[2];
array[0] = parameterExpression;
array[1] = parameterExpression2;
Expression.Lambda<Func<int, int, int>>(body, array);

Applying all these tiny little things makes it hard to maintain expression tree generation in the compiler.

If we want to extend the expression-tree support I would like to ignore these things and create just a raw-transformation of the syntax-tree into an expression tree.
Especially because for example the implicit conversion is not always needed - or the semantics are different in the execution environment (e.g. SQL).

Another approach would be something like "roslynQuoter" but for expression trees.

15 replies

roji Nov 23, 2022
Collaborator

@Bosch-Eli-Black the problem generally isn't to just add new types to the Expression hierarchy; it's how to handle a new expression tree feature being used within a LINQ query (produced from regular C# syntax), which is possibly passed to an old LINQ provider that doesn't know about it.

In any case, the problem isn't really how to implement support - AFAIK the compiler team has already discussed and arrived at some satisfactory approaches. It's more about prioritizing this vs. other work.

Eli-Black-Work Nov 24, 2022

Thanks, @roji 🙂 I was envisioning that this would solve the issue of passing LINQ queries with new features into old LINQ providers: If the programmer writes an expression that contains new features, it would be of type CSharp10Expression, which couldn't be passed to the old LINQ provider, since the old LINQ provider only accepts Expression.

roji Nov 24, 2022
Collaborator

That would mean that in order for a LINQ provider to support new C# features, it needs to be rewritten to dump Expression altogether and accept/work with CSharp10Expression instead (and that needs to be happen again for C# 11); that doesn't sound like a great solution.

Once again, there have already been discussions around implementation - that's not where the problem is; it's about prioritizing the work. So while implementation discussions are interesting to have, they won't really advance this issue.

Eli-Black-Work Nov 24, 2022

@roji Okay, understood, thanks 🙂

IS4Code Nov 24, 2022

Just to add a counterexample ‒ I was making a reflection-based Expression serializer, and there were only a handful of special expression types like MemberExpression and MethodCallExpression that needed some additional handling, but the rest could be reasonably deserialized by dynamically calling one of the static methods under Expression. Such a library could be somewhat forward-compatible.

I don't think it should be a big deal if a library using Expression<T> cannot handle some of the subtypes, that's something to be expected if one essentially uses a big switch on an extensible base type; it just need to be properly handled and communicated. I am fairly confident most libraries that are considered here won't work anyway if you throw something like DynamicExpression at them as a regular argument constructed explicitly from Expression.Dynamic, so it shouldn't make a difference if C# gives you tools to make one more easily. It's not like someone would expect something like an SQL generator to be able to handle all of it anyway, just as one won't expect a library designed to work with files to work when given something that's not a file.

ghuntley · 2021-05-18T15:59:54Z

ghuntley
May 18, 2021

off-topic: icmyi https://github.com/reaqtive/reaqtor was open-sourced by @bartdesmet moments ago. If you like expression trees then there are 15 years of implemented knowledge right there.

0 replies

FiniteReality · 2021-06-11T16:27:57Z

FiniteReality
Jun 11, 2021

Question: instead of being a compiler option (and therefore effectively a dialect option) couldn't this be specified purely by the SDK, in the same manner as AssemblyVersionAttribute is specified now? That way, Roslyn and the language spec would only specify what the translation is (i.e. a + b -> Add(a, b)) and the SDK would specify which builder to use via an assembly-level ExpressionBuilderAttribute. Unannotated assemblies would be treated as they are now, with the "1.0" features.

3 replies

AndriySvyryd Jun 11, 2021
Collaborator Author

Compiler support allows to generate better error messages:

Range operators are not supported in expression trees V1. Check the current LINQ provider documentation for whether they are supported and set ExpressionTreeLangVersion to V2 or higher to allow them to be used.

Instead of:

'ExpressionBuilderV1' does not contain a definition for 'Range'

FiniteReality Jun 11, 2021

The compiler could also generate a message such as The '..' operator is not a supported expression by 'method name'

FiniteReality Jan 31, 2022

I still think this is the right way to go if the C# language does allow this. It would still require compiler changes, but I believe that a single translation "spec" would be far easier to maintain from the language side, rather than having to maintain multiple dialects for expression trees. To clarify, I'm suggesting:

That C# maintains a single "mapping" of syntax -> expression tree
- e.g. a + b turns into Add(a, b) and a || b turns into LogicalOr(a, b)
- Importantly, there are no dialects here - this mapping should evolve with the language, as new constructs are added (and removed, I suppose)
The C# compiler generate useful error messages if:
- The most relevant ExpressionBuilderAttribute references a type which does not have a given method
  - e.g. a + b generates the expression 'a + b' is not supported by 'NoAddition' for NoAddition((a, b) => a + b)
- There is no relevant ExpressionBuilderAttribute and an unsupported piece of syntax is used (i.e. current rules)
The .NET SDK applies ExpressionBuilderAttribute at the assembly level by default for new code
- This could be fed by an MSBuild property, to specify which "version" of expressions the user wants
- Importantly, it moves the maintenance of expression builders from the language side of .NET to the API side of .NET
The .NET runtime takes on maintenance and development of new expression tree implementations, making it easier to make changes

IS4Code · 2021-12-09T12:40:34Z

IS4Code
Dec 9, 2021

One thing, not directly related to the language itself, I would like to see is caching the expression trees when possible, like is done already for lambdas:

Func<object, string> f = o => o.ToString(); // the Func<object, string> is reused for every subsequent call
Expression<Func<object, string>> e = o => o.ToString(); // the Expression instance is created every time

Reusing the Expression instance (when possible) would make it better for things like conversion to SQL as the converted query could be cached based on the identity of the Expression instance.

I am not sure how this might be a breaking change (expression trees are already immutable) but, if there would be an issue with that, I had the idea for the following syntax:

Expression<Func<object, string>> e = static o => o.ToString();

There's a possibility to reuse this static "operator" for other things in the future for expressions desired to be statically-cached.

0 replies

raffaeler · 2021-12-09T13:26:03Z

raffaeler
Dec 9, 2021

I am very interested to stay updated as much as possible to any smallest implementation/change/evolution.
Are the links to the features updated?
Or is there any better way (branch/label/other) in the roslyn repo to closely follow?
Thanks

0 replies

sab39 · 2022-01-25T18:39:55Z

sab39
Jan 25, 2022

As far as I can see this proposal isn't currently Championed which means that it's not being worked on by the team at this time. I'd love to see something along these lines too, though.

One specific detail about the proposal that I'd like to address: it seems to still be definitively tied to Expression<>, in that the proposed ExpressionBuilder is applied at the assembly or method level and applies only to parameters of Expression<> types. I could imagine (rare, but conceivable) scenarios where you might want to use a different builder for different parameters, and you definitely might want a different return type.

What if the ExpressionBuilderAttribute were applied at a Type level instead, and any type annotated with such an attribute would gain an implicit conversion from a lambda expression. So I could define something like:

[ExpressionBuilder(typeof(MyExpressionBuilder))]
public class MyExpression {
  // whatever implementation here
}

public static class MyExpressionBuilder {
  public static MyExpression Parameter(Type type, string name) => new MyExpression(something);
  public static MyExpression Add(MyExpression left, MyExpression right) => new MyExpression(something);
}

MyExpression expr = (a, b) => a + b;

By itself this isn't a fully formed proposal - for example, it doesn't explain how the types of a and b would be determined. One option would be to require that the annotated type (in this case MyExpression) be generic over a delegate type the way that Expression<> is, but it'd be nice to have more flexibility than that.

In general, the idea would be that the BCL could annotate Expression<> with [ExpressionBuilder(typeof(Expression))] to get the current behavior without the compiler needing to special-case that specific type.

This also, to some extent, resolves the language evolution problem, in that a whole new type could be defined for the newer version, and if a newer language construct were used with the old Expression<> type, you'd get a compile error because the corresponding static method didn't exist. It's not an ideal solution because of the need to define a whole new type for each version of the language, so it's probably worth still doing something like the configuration system suggested in the original proposal.

2 replies

bartdesmet Jan 25, 2022

This branch of Roslyn does implement exactly what you're describing above:
https://github.com/bartdesmet/roslyn/blob/ExpressionTreeLikeTypes/docs/features/expression-types.md. Also see the corresponding sample over at https://github.com/bartdesmet/roslyn/blob/ExpressionTreeLikeTypes/docs/features/ExpressionTypes.cs.

This also enables creating subsets of expression nodes by virtue of specifying custom builders, e.g. one for supported language constructs in Where predicates versus Select selectors, etc. to get better compile-time checking.

Also, FWIW, this attribute approach could potentially be taken forward to expression-typed parameters (i.e. definition site), and potentially on lambda expressions as well (i.e. usage site). I haven't explored these routes with a hands-on implementation though.

Finally, note that this branch is orthogonal to https://github.com/bartdesmet/roslyn/tree/ExpressionTrees, which is the branch where all C# language features up to C# 10.0 have been implemented for binding to new factory methods (without an intermediate builder type involved). The merge of both branches would pretty much align with the thinking outlined in this thread.

evilguest Nov 26, 2022

Yep, the combination of these two approaches seems to cover all the usage scenarios that come to mind, including the backwards compatbility. The latter would assume that all the goodness from https://github.com/bartdesmet/roslyn/tree/ExpressionTrees should be moved to a different builder class, e.g. ExpressionEx : Expression. The BCL than would annotate the Queryable extension methods with the [ExpressionBuilder(typeof(Expression)] attribute, so the various queryable providers will be protected against the unexpected expressions brought in by the modern language. Overriding this at the usage site might help covering the scenarios where the underlying provider does handle a wider set of expression trees than the baseline implementation.

What can we do to help moving this forward?

HowardvanRooijen · 2022-01-25T19:16:15Z

HowardvanRooijen
Jan 25, 2022

@JeremyLikness are you still championing this from an EF perspective?

3 replies

JeremyLikness Jan 25, 2022
Collaborator

@AndriySvyryd is the key driver for this on our team. I'm available to support Andriy, @ajcvickers and the team however I can.

AndriySvyryd Jan 25, 2022
Collaborator Author

We are working with the language team. Most likely a new proposal will be submitted sometime this year.

Eli-Black-Work Aug 9, 2022

@AndriySvyryd Are you aware of #2545 ? Seems possibly relevant 🙂

erik-kallen · 2022-01-31T08:51:58Z

erik-kallen
Jan 31, 2022

Perhaps a better way to enable quoting of code (which is the problem expression trees solve) would be an approach like Scala macros (https://docs.scala-lang.org/overviews/macros/overview.html).

Pretty much, you'd do

    MyRepresentation QuoteCodeForMyLibrary<T1, T2>(Func<T1, T2> arg) = macro MyImplementation;

where MyImplementation would depend on the compiler AST, but we already have source generators that work like that.

11 replies

HaloFour Jan 31, 2022

Expression trees cannot be interpreted at compile time without completely breaking the model. The mapping data doesn't exist at compile time.

HowardvanRooijen Jan 31, 2022

+1 - the magic of expression trees is being able to translate between data < > code interchangeably at runtime; it's very different to standard "code gen".

erik-kallen Jan 31, 2022

@HaloFour Yes, I know. My suggested inspiration Scala macros would be a completely different way to achieve the same goal.

HaloFour Jan 31, 2022

@erik-kallen

Yes, I know. My suggested inspiration Scala macros would be a completely different way to achieve the same goal.

I disagree that moving the evaluation of expression trees or basing them on some language/compiler-specific macro model would help to achieve that goal. I don't think that would be possible without breaking the entire queryable provider ecosystem.

raffaeler Jan 31, 2022

I don't understand what you mean, and I also do not understand why whatever you mean would apply more here than in source generators (which already exist).

Scala macros work, and they would certainly be capable of generating SQL. Whether it is worth doing in C# is another issue, but they certainly are a working solution to code generation.

The SQL use-case is just one of the many examples, I am not referring to it.
We use it for DSLs or to transform data from one type to another that will only be known at runtime. At compile time you don't have any info, not even the shape of the lambda being generated.

aradalvand · 2022-11-21T21:06:51Z

aradalvand
Nov 21, 2022

No updates on this?

2 replies

CyrusNajmabadi Nov 21, 2022
Collaborator

When there are updates, the team generally posts them. No need to poll :)

aradalvand Nov 21, 2022

@CyrusNajmabadi Sure, my comment was just meant to encourage some traction here. This is a highly requested feature but there doesn't seem to be any activity on it from the maintainers.

SteveAndrews · 2022-11-22T00:13:59Z

SteveAndrews
Nov 22, 2022

+1 For this proposal.

For instance, Expression.Call is a nightmare. It should be Expression.CallStatic and Expression.CallInstance.

I'm tired of: Static method requires null instance, non-static method requires non-null instance. and having to fight to figure it out.

What I'm trying to generate: (e, f) => new Tuple<int, decimal>(e.Key, e.Sum(g => g.Volume))

3 replies

thomaslevesque Nov 22, 2022

For instance, Expression.Call is a nightmare. It should be Expression.CallStatic and Expression.CallInstance.

That's not the kind of issue this proposal is trying to solve. As far as I can tell, there's no plan to make breaking changes to the existing API.

SteveAndrews Nov 22, 2022

Well, perhaps an issue titled "evolution" should actually attempt to evolve the API into something usable. I've been fighting with expression trees for four days, and hours today trying to get Expression.Call to work. It's a steaming pile of technically correct that needs to be evolved. shrug

IS4Code Nov 22, 2022

This is the csharplang repository, so this issue attempts to evolve the language, not the API. If you seek to add a method that behaves the same as an overload of an existing method, the runtime repository is the proper one.

ajcvickers · 2022-11-22T09:46:51Z

ajcvickers
Nov 22, 2022

My understanding is that this work has so far not met the bar for investment. That could change, but I don't expect it to any time soon. I think we are back to the status quo from the last few years that System.Expressions is what it is.

3 replies

thomaslevesque Nov 22, 2022

Keeping the status quo would be a shame. Expression trees are one of the most amazing features of C#. The status quo would mean this feature would become less and less relevant over time, even though it has tons of useful applications.

aradalvand Nov 22, 2022

Horrible news honestly.

WhitWaldo Feb 2, 2023

If expression trees were being considered for inclusion to C# as a new feature, I'd say yours is a fair conclusion and that it needs more consideration. However, it seems to me that ship has sailed - they were introduced and it strikes me as a poor pitch for C# as a whole if new features, once introduced, may be indefinitely abandoned in any future iteration.

Would you be able to speak to why this isn't approachable via an opt-in flag and a language version like nullable reference types are/were? I don't expect a .NET Framework 2.0 library to work flawlessly from .NET 7 because things have changed in both the language and compiler. It's why older versions of .NET are EOL'ed is so the framework and language can be iterated on, I thought. Why doesn't that hold here as well?

Alternatively, why can't the slight changes to Roslyn be made to support Bart's implementation such that a far more expressive tree is available as an opt-in NuGet package iterated on independently from .NET itself? Same idea, but as the package evolves, it can maintain its own versioning dependency requirements.

olmobrutall · 2023-07-10T14:10:29Z

olmobrutall
Jul 10, 2023

Could we not start with the easy ones to get the ball moving.

Most of the main pain points could be lowered to the current expression tree nodes:

Conditional access expressions

Translate
FROM: a => a.name?.ToString()
TO: a => a.name == null ? null : a.name.ToString()

Named and optional parameters

Given

public bool MyMethod(object a, bool withMessage = false, bool withColor = false)
{
   //...
}

Translate
FROM: a => MyMethod(a, withColor: true)
TO: a => MyMethod(a, false, true)

Tuple literals

Translate
FROM: a => (a, a)
TO: a => ValueTuple.Create(a,a)

Tuple equality and inequality

var a = (1, 3);
var b = (1, 3)

Translate
FROM: () => a == a
TO: () => a.Item1 == b.Item1 && a.Item2 == b.Item2

This 4 things, but specially the first one, will cover most of the paint points with minimal braking changes, if any.

Some other expressions like Dictionary initializes, Array initializers, Discard parameters, etc could also be implemented as a lowering-only solution.

Things like Pattern Matching / switch expressions, while useful, will have limited use in practice because the Where can not give parameters to the Select. Also are very hard to implement from the Roslyn side and from the LINQ provider side.

As for the statement/assignments nodes, they where discarded in C# 3 because the focus was on SQL-translatable expressions, and this hasn't changed much.

Async/await inside queries looks like a corner case for me and dynamic could be useful in theory but then... just use SQL instead of LINQ.

As a LINQ provider implementation, I would like to give expressive power to the consumer user, but also have a reduced set of nodes to translate. Trying to solve the general solution is stopping all progress here.

5 replies

WhitWaldo Jul 10, 2023

There's already a fork of Roslyn that supports expression tree updates through C# 11 without having to completely design that functionality from scratch.

Trying to solve the general solution is stopping all progress here.

No, the general concern is there's no path forward to updating what's already in place as it would have backwards compatibility implications and break things and that's an unacceptable cost.

That said, per the separate discussion on this on #158 I've asked whether this is something that could simply be dropped into a "modern" namespace in a separate package so as to avoid any conflict with the original expression trees and while that seemed like a viable route, there hasn't been any further traction on the discussion.

roji Jul 10, 2023
Collaborator

@olmobrutall take a look at the comments above, this has all been amply discussed already.

olmobrutall Jul 10, 2023

@roji can you point to any comment in particular? I am aware of the huge work of @bartdesmet and I would love that it will get merged.

As of how to version the expression tree factory... I don't really think is a big problem:

In the worst case, the end user is going to use open an application, update the C# version without updating the LINQ provider (already strange) and explicitly modify a query to use some fancy new feature, and the LINQ provider is going to fail at some point. No big deal, the LINQ providers fails anyway in many other cases like when you call a web service in the middle of a Where statement.
Somehow better, a fail-fast runtime failure is thrown by the LINQ provider when starting to translate the query shows the user that a particular node type is not yet supported. Ok for me.

The solutions proposed here are way to ambitious, supporting in theory all the new C# constructs is not worth in most of the queries: async await, pattern matching.... won't be implemented in most LINQ providers, and if so... won't be used.

And a compile-time way of determining library support. This is challenging because of the runtime nature of how LINQ operators are combined (Provider.CreateQuery) since all the important methods (Where, Select,...) are in a shared library (Queryable), requiring yet another piece of complexity (ConfigurableExpressionBuilder) and MS Build support.... and then not working with multiple providers... is not worth.

So my approach was to try a "Mimimum Agreement".

What new expression nodes we need to support Tuples? or optional arguments? Nothing really... The Expression tree API is not a full fidelity one like roslyn nodes, and as @bernd5 pointer out, there are already mismatches between the code that you write and the expression ToString() result.

Just get 80% of the value with 20% of the effort.

IS4Code Jul 10, 2023

So far in discussions I think there are two camps of thought here:

a) (a, b) is fundamentally different from ValueTuple.Create(a, b) ‒ the first expression also states the names the tuple fields, unlike the plain old method call. Similar situation is for the other cases, where there are other "hints" that would be lost if the expression was simply lowered to what the code is usually compiled into. Sometimes it is not necessary, sometimes it is, but new vocabulary should be added for every situation like this, such as Expression.Tuple or additional options for Expression.Call and Expression.Invoke to specify parameter names or which are optional. Of course some of these feature a very C#-specific, some are just syntactic sugar etc., but the general reason is that once you start "lowering" everything to a code that behaves equivalently at runtime ("as-if"), upgrading it when people request it would be a breaking change.

b) C# has grown a lot since the old times but a lot of the features are just convenience, for example Expression<Func<object>> e = () => new(); works in an expression, and so does Expression<Func<Delegate>> e = () => (int a = 1) => 1;, despite both using later features. Additionally, there is the concern of breaking existing code working with expressions ‒ whereas you would get a compiler error when changing ValueTuple.Create(a, b) to (a, b) in an expression now, if there was a custom type of expression for this kind of initialization, this transformation would hide a runtime error due to unsupported expression type.

I believe these approaches should be reconciled in some fashion, that can be, hopefully, implemented in individual steps that do not have to solve everything at once:

Identify expressions that are truly syntactic sugar, in the same manner that var a = new object() and object a = new() are really just object a = new object(), or object being System.Object. These should not differ in the generated metadata, and their potential "lowering" has to be really simple. If there are any such kinds of expressions that are not allowed yet, they should be permitted since it should be really unproblematic to do so.

I was thinking whether (a, b) == (c, d) also falls into this category, but I realized it is not simply a == c && b == d, since that changes the order of evaluation, and fixing it would require a temporary variable. I think a good rule of thumb is that newly supported expression types should not be lowered to something that is not allowed in C# today (so no variables or blocks).
Identify expressions that are "essentially" equivalent to their lowered forms, but there may be subtle hints that are advantageous for some situations. Something like tuple.a (already permitted) instead of tuple.Item1 could fall into this category, and tuple construction too, probably.

These expressions should be lowered as expected, so (a, b) would actually produce a MethodCallExpression to ValueTuple.Create. However, these objects could be "decorated" with additional data about the specific flavour of expression ‒ C# already does that in metadata, by generating TupleElementNamesAttribute, NullableAttribute etc., so it could simply construct such an instance and attach it to the expression object. Similarly, calls to functions with optional parameters can be lowered by expanding the arguments, but adding a special property to mark that those arguments were filled with default values and were not stated explicitly. This might not work for named arguments in some cases however, since it would affect the order of evaluation if the arguments were permutated.

Even these steps would permit to use some types of expressions, but many would still be disallowed. This is what the following steps remedy:

Add a new attribute [ExpectedExpressionTypes(ExpressionType[])] and [ExpectedExpressionTypes(Type[])] to decorate parameters, properties, fields, methods (to affect all parameters), or types and assemblies (to affect everything within). The compiler would track all expression types that were used to construct the argument or value assigned to the member, and warn if any of them are not within the set of expression types declared by the receiver.

Expression types are deliberately expressed here as both ExpressionType and Type, so that you could for example declare to accept all binary expressions via [ExpectedExpressionTypes(typeof(BinaryExpression)], or just + by using [ExpectedExpressionTypes(ExpressionType.Add)] (which does not correspond to a unique class). You can also declare to understand everything via [ExpectedExpressionTypes(typeof(Expression)] (if all you do is call Compile for example). By default, supported expression types would be what is produced by C# today.

ExpectedExpressionTypes could be attached to method returns too, to inform the compiler about what expression types may be produced by the method.
Make use of all the existing expression types in .NET, e.g. dynamic expressions, blocks, try, throw, various assignments and other stuff. At this point, it is completely safe to use all these newly unlocked expression types in existing code, since you would be warned the moment you use one of them with code that doesn't support it.
Add new expression types to accommodate all expressions that were not allowed in earlier steps. This will allow the ?. operator, dictionary initializers, deconstructors, async/await, and other stuff. This does not have to happen all at once, since the mechanism to state what expression types are supported will already have been established by this point.

Additionally, these new expression types would override the Reduce method to produce expressions that are equivalent to them in behaviour, while using the expression "vocabulary" that is available in .NET today. In theory, you could write a fancy expression with pattern matching and stuff, and then reduce it to something that has the same semantics but using variables, blocks and others, and the compiler will tell you if the method accepts it then.

I can put this into a concrete proposal if requested.

julealgon Jun 12, 2024

@olmobrutall take a look at the comments above, this has all been amply discussed already.

@roji doesn't this lack of work on expressions eventually lead to EFCore being less and less usable as time goes on? If EF is heavily based on translating expressions into SQL, the more the language evolves and introduces new features, the less support there will be in EF making it much less desirable to use.

Is there a thread/issue on the EFCore repo discussing this (and potential alternatives) that you could share?

ajcvickers · 2023-07-10T19:11:58Z

ajcvickers
Jul 10, 2023

@olmobrutall The point is that the blockers here aren't technical. The blocker is that the .NET Directors have decided that we're not investing in this area. Nothing is going to happen unless that changes.

16 replies

HowardvanRooijen Jul 11, 2023

My current (un-fleshed-out) thoughts are around whether it's possible to externalise Expression Trees from the core framework, to allow them to evolve (via community effort / corporate sponsorship) as a stand alone set of extensions / plugins.

A second point is that I'm sure I've seen that EF (one of the core internal users of ETs) are looking to move away from ETs to better support AOT scenarios. If this is true it has two ramifications 1) it pushes ETs further into the domain of obsolete technologies 2) IMHO mitigates the risk of breaking changes.

If there is a lack of will from the .NET Execs to invest in this modernisation, then the question really is, are they willing to invest in letting the community take ownership and evolve it instead?

olmobrutall Jul 11, 2023

The idea sounds nice from a political point of view but... how can you make Roslyn plugable? They will need to know what method to call with which parameters for each new expression node. At this point how much flexibility the library has?

roji Jul 11, 2023
Collaborator

@HowardvanRooijen we have no current plans at the EF team to move away from LINQ ETs; the current plans are to translate Roslyn trees into expression trees and vice versa, not to do away with ETs (both because we still need to do handling of queries at runtime which requires ETs, and because we can't rewrite all of EF).

I'm very skeptical about directions like externalizing this out of Roslyn. At the very least, the effort required in defining that extensibility point properly would likely be more work than just evolving ETs; but it also really feels like something that belongs in the compiler in any case, for many reasons.

HowardvanRooijen Jul 11, 2023

That's good to know... I must say that was my assumption, of trying to work out how you'd achieve the objective of supporting AOT, based on reading this article (where you're quoted) https://devclass.com/2022/12/22/microsoft-plans-pre-compiled-queries-for-entity-framework-and-may-replace-old-and-crufty-net-sql-server-provider/

I'm sceptical too... but that's because we're now at the "clutching at straws" phase because we have no other viable options left.

WhitWaldo Jul 11, 2023

For me, the most intriguing aspect of modernization of expression trees is inspired by the Reaqtor project, but rather blocked by lack of modern support (async being the critical piece). The idea of being able to generate and serialize expressions in order to deploy aspects of work across a distributed system (e.g. send the code to the data) opens up some really neat applications of C# that isn't trivially done today without using some intermediate language and all the complexity that comes with producing and maintaining that.

Outside of just adding to this thread, is there a more suitable forum to which the case for this modernization work could be made to the .NET directors?

bernd5 · 2023-07-11T10:14:28Z

bernd5
Jul 11, 2023

Currently I use do not use Entity Framework or expression trees with Linq.
The main reason is that we need to have some "dummy" types which describe the model of the database. But what if the database changes - we need to refactor and recompile our applictions - this is inacceptable for me.

If we could allow data access without a fixed model aka dynamic database access the importance of expression trees might change...

7 replies

bernd5 Jul 11, 2023

Writing queries by hand is quite a bad idea because there is no common SQL understood by every db vendor...

The problem is that I have no fixed db model - my customers have very different data requirements.
In addition I want to support different database providers - e.g. Oracle, PostgresDB, DB2, MongoDB, SQL Server...
So I can't write plain SQL.

Currently I build my own immutable query-expression tree. The API usage looks like:

var query = new SelectQuery()
.Select(defTabAlias.Dot("*"))
.From(defTab.Alias(defTabAlias))
.Join(defAssignTab.Alias(defTabAssignAlias), defTabAlias.Dot(defTabPk).Equal(defTabAssignAlias.Dot(defTabPk))
    .And(defTabAlias.Dot("IS_VALID").Equal(true)))
.Where(defTabAssignAlias.Dot(classTabPk).In(classIds));

In addition I have written a Parser which creates such a query model (some queries - and especially filters are stored in the customer db).
The syntax is based on Oracle SQL - but I have implemented different db providers and can query for example mongoDB (which has by it's nature no fixed model), too.

bernd5 Jul 11, 2023

it would be great if I could write e.g.:

from t in DynTable("SomeTable")
where t.DynField("Name") == "Foo"
select new { Foo = t.DynField("SomeField") }

instead of

var query = new SelectQuery()
.Select("SomeField".Alias("Foo"))
.From("SomeTable".Alias(t))
.Where("t".Dot("Name").Equal("Foo"));

or even better:

from t in "SomeTable"
where t.Name == "Foo"
select new { Foo = t.SomeField }

Such code actually compiles - but the translation is not simple...
sharplab

roji Jul 11, 2023
Collaborator

I'm still confused on your original point of not having to refactor your application if your query code is dynamic. For example,in your own immutable query-expression tree above, if the column name IS_VALID changes in the database, your query fails. What's worse, since everything is just strings in your code, you can't use refactoring with confidence to update all queries that need to be modified.

Writing queries by hand is quite a bad idea because there is no common SQL understood by every db vendor...

LINQ queries also don't run the same way on every DB vendor, and some queries that run on one DB won't run on another. It's not possible/recommended to fully abstract over databases regardless of what data access tool you're using.

The problem is that I have no fixed db model - my customers have very different data requirements.

That's a very common scenario - typically you'd write (or better, generate) a static model per customer.

jasekiw Feb 13, 2025

@bernd5 I'm a bit late for the party but I think I might know what you are getting at. Are you are referring to dynamic schema kind of like what kentico does does? The tables and it's columns are created by the user and there is no way to create a static type from this. I don't think ef core will every support a query builder api like this since it seems so separated from it's purpose. There might be other libraries out there that help with this. This one in particular looks close to what you'd like but not linq based (I've not personally used it). https://sqlkata.com/

Side Note: Since kentico is a framework, they were also able to get away from the dynamic api a little bit since the customers were in control of the schema and could rebuild their own project to include static types. That falls apart though if your customers are not developers.

bernd5 Feb 14, 2025

Yes, I tried sqlkata First

glen-84 · 2023-08-27T11:53:24Z

glen-84
Aug 27, 2023

Static abstract interface members should probably be added to the list.

CS8927 An expression tree may not contain an access of static abstract interface member

#5997

0 replies

[Proposal]: Expression tree evolution #4727

AndriySvyryd May 7, 2021 Collaborator

Expression tree evolution

Summary

Motivation

Detailed design

ExpressionTreeLangVersion

ExpressionBuilderAttribute

Drawbacks

Dynamic expression handling

Multiple LINQ providers

Alternative expression tree representations

Alternatives

Let LINQ providers expose the supported expression version

Default to the latest expression version for LINQ methods.

Pros:

Cons:

Runtime checks by the queryable operators

Pros:

Cons:

Mirror C# versioning

Introduce a new expression node hierarchy

Use pre-processor directives

Unresolved questions

Design meetings

Replies: 15 comments · 70 replies

roji Nov 23, 2022 Collaborator

roji Nov 24, 2022 Collaborator

AndriySvyryd Jun 11, 2021 Collaborator Author

JeremyLikness Jan 25, 2022 Collaborator

AndriySvyryd Jan 25, 2022 Collaborator Author

CyrusNajmabadi Nov 21, 2022 Collaborator

Conditional access expressions

Named and optional parameters

Tuple literals

Tuple equality and inequality

roji Jul 10, 2023 Collaborator

roji Jul 11, 2023 Collaborator

AndriySvyryd
May 7, 2021
Collaborator

Replies: 15 comments 70 replies

roji Nov 23, 2022
Collaborator

roji Nov 24, 2022
Collaborator

AndriySvyryd Jun 11, 2021
Collaborator Author

JeremyLikness Jan 25, 2022
Collaborator

AndriySvyryd Jan 25, 2022
Collaborator Author

CyrusNajmabadi Nov 21, 2022
Collaborator

roji Jul 10, 2023
Collaborator

roji Jul 11, 2023
Collaborator