TraceQL: support mixed-type attribute querying (int/float) #4391

ndk · 2024-11-27T17:53:59Z

What this PR does:
Below is my understanding of the current limitations. Please feel free to correct me if I’ve misunderstood or overlooked something.

Attributes of the same type are stored in the same column. For example, integers are stored in one column and floats in another.

Querying operates in two stages:

Predicate Creation: Predicates are created based on the operand types.
Chunk Scanning: Chunks are scanned, and spans are filtered using the predicates.

The issue arises because predicates are generated based on the operand type. If an attribute is stored as a float but the operand is an integer, the predicate evaluates against the integers column instead of the floats column. This results in incorrect behavior.

Proposed Solution
The idea is to generate predicates for both integers and floats, allowing both columns to be scanned for the queried attribute.

In this PR, I’ve created a proof-of-concept by copying the existing createAttributeIterator function to createAttributeIterator2. This duplication is intentional, as the original function is used in multiple places, and I want to avoid introducing unintended side effects until the approach is validated.

case traceql.TypeInt:
	{
		pred, err := createIntPredicate(cond.Op, cond.Operands)
		if err != nil {
			return nil, fmt.Errorf("creating attribute predicate: %w", err)
		}
		attrIntPreds = append(attrIntPreds, pred)
	}

	{
		if i, ok := cond.Operands[0].Int(); ok {
			operands := traceql.Operands{traceql.NewStaticFloat(float64(i))}
			pred, err := createFloatPredicate(cond.Op, operands)
			if err != nil {
				return nil, fmt.Errorf("creating attribute predicate: %w", err)
			}
			attrFltPreds = append(attrFltPreds, pred)
		}
	}

case traceql.TypeFloat:
	{
		operands := traceql.Operands{traceql.NewStaticInt(int(cond.Operands[0].Float()))}
		pred, err := createIntPredicate(cond.Op, operands)
		if err != nil {
			return nil, fmt.Errorf("creating attribute predicate: %w", err)
		}
		attrIntPreds = append(attrIntPreds, pred)
	}

	{
		pred, err := createFloatPredicate(cond.Op, cond.Operands)
		if err != nil {
			return nil, fmt.Errorf("creating attribute predicate: %w", err)
		}
		attrFltPreds = append(attrFltPreds, pred)
	}

WDYT? :)

Which issue(s) this PR fixes:
Fixes #4332

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

joe-elliott · 2025-01-02T18:55:08Z

I apologize for taking so long to get to this. Your analysis is correct! We do generate predicates per column and, since we store integers and floats independently we only scan one of the columns. Given how small int and float columns tend to be (compared to string columns) I think the performance hit of doing this is likely acceptable in exchange for the nicer behavior.

What is the behavior in this case? I'm pretty sure this will work b/c the engine will request all values for the two attributes and do the work itself. I believe the engine layer will compare ints and floats correctly but I'm not 100% sure.

{ span.intAttr > span.floatAttr }

Tests should also be added here for the new behavior. These tests build a block and then search for a known trace using a large range of traceql queries. If you add tests here and they pass it means that your changes work from the parquet file all the way up through the engine.

This will also break the "allConditions" optimization if the user types any query with a number comparison:

https://github.com/grafana/tempo/pull/4391/files#diff-a201423ab0b50d4455a497bf1804b1a9f596394413c28b7702710f89237c49c1R2815-R2821

I would like preserve the allConditions behavior in this case b/c it's such a nice optimization and number queries are common. I'm not quite sure why the len(valueIters) == 1 condition exists so we'd need to do some research into it.

ndk · 2025-01-12T11:39:27Z

I apologize for taking so long to get to this. Your analysis is correct! We do generate predicates per column and, since we store integers and floats independently we only scan one of the columns. Given how small int and float columns tend to be (compared to string columns) I think the performance hit of doing this is likely acceptable in exchange for the nicer behavior.

Thank you for confirming the approach and pointing out the allConditions optimization. Right now, the fix scans both integer and float columns for attributes that might be either type. I’ve also adjusted how float comparisons work for integer fields, taking into account the fraction part and the comparison operator.

What is the behavior in this case? I'm pretty sure this will work b/c the engine will request all values for the two attributes and do the work itself. I believe the engine layer will compare ints and floats correctly but I'm not 100% sure.
{ span.intAttr > span.floatAttr }

I verified that { span.intAttr > span.floatAttr } behaves as expected. Wanna me to add a test to cover this case?

Tests should also be added here for the new behavior. These tests build a block and then search for a known trace using a large range of traceql queries. If you add tests here and they pass it means that your changes work from the parquet file all the way up through the engine.

Done. Let me know if I missed something.

This will also break the "allConditions" optimization if the user types any query with a number comparison:

https://github.com/grafana/tempo/pull/4391/files#diff-a201423ab0b50d4455a497bf1804b1a9f596394413c28b7702710f89237c49c1R2815-R2821

I would like preserve the allConditions behavior in this case b/c it's such a nice optimization and number queries are common. I'm not quite sure why the len(valueIters) == 1 condition exists so we'd need to do some research into it.

Regarding the allConditions block, the optimization is lost because we generate two predicates (one for int, one for float) under the same attribute name, triggering a LeftJoinIterator instead of a JoinIterator. Possible workarounds I’m considering:

Creating a variant of JoinIterator that uses logical OR rather than AND.
Exploring parquet.multiRowGroup, parquetquery.UnionIterator, or parquetquery.KeyValueGroupPredicate to see if they can unify the int/float search without losing the optimization.
~~Refactoring a single ColumnChunk to the multy-one.~~

Given my limited exposure to Tempo’s internals, I’d appreciate any guidance on whether these routes are viable or if there’s a simpler approach to preserve allConditions.

P.S. Do we care about comparisons with negative values? Should it also be covered?

ndk changed the title ~~WIP: Proposal to address mixed-type attribute querying limitations~~ TraceQL: Proposal to address mixed-type attribute querying limitations Nov 27, 2024

ndk mentioned this pull request Dec 10, 2024

TraceQL: Ints can't be compared to floats #4332

Open

ndk changed the title ~~TraceQL: Proposal to address mixed-type attribute querying limitations~~ WIP: Proposal to address mixed-type attribute querying limitations Dec 19, 2024

ndk force-pushed the mixed-type-attr-query branch 2 times, most recently from 3d8f31d to 812d768 Compare January 10, 2025 16:41

ndk changed the title ~~WIP: Proposal to address mixed-type attribute querying limitations~~ Mixed-type attribute querying (int/float) Jan 12, 2025

ndk marked this pull request as ready for review January 12, 2025 11:42

ndk requested review from joe-elliott, mdisibio, mapno, yvrhdn, zalegrala, electron0zero, ie-pham, stoewer and javiermolinar as code owners January 12, 2025 11:42

TraceQL: support mixed-type attribute querying (int/float)

171afab

ndk force-pushed the mixed-type-attr-query branch from 812d768 to 171afab Compare January 12, 2025 12:42

ndk changed the title ~~Mixed-type attribute querying (int/float)~~ TraceQL: support mixed-type attribute querying (int/float) Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TraceQL: support mixed-type attribute querying (int/float) #4391

TraceQL: support mixed-type attribute querying (int/float) #4391

ndk commented Nov 27, 2024

joe-elliott commented Jan 2, 2025

ndk commented Jan 12, 2025 •

edited

Loading

TraceQL: support mixed-type attribute querying (int/float) #4391

Are you sure you want to change the base?

TraceQL: support mixed-type attribute querying (int/float) #4391

Conversation

ndk commented Nov 27, 2024

joe-elliott commented Jan 2, 2025

ndk commented Jan 12, 2025 • edited Loading

ndk commented Jan 12, 2025 •

edited

Loading