fix: lock MeasurementFields while validating #25998

davidby-influx · 2025-02-11T22:04:02Z

There was a window where a race between writes with
differing types for the same field were being validated.
Lock the MeasurementFields struct during field
validation to avoid this.

closes #23756

There was a window where a race between writes with differing types for the same field were being validated. Lock the MeasurementFields struct during field validation to avoid this. closes #23756

gwossum · 2025-02-11T23:02:37Z

tsdb/shard.go

-			case PartialWriteError:
-				if reason == "" {
-					reason = err.Reason
+		cont, err := func(p models.Point, iter models.FieldIterator) (cont bool, err error) {


It looks like the only time we don't have cont == true is where there is an error, and then the continue is at the bottom of the loop. Is cont really doing anything?

gwossum · 2025-02-11T23:18:34Z

tsdb/shard.go

+				fieldsToCreate = append(fieldsToCreate, &FieldCreate{
+					Measurement: name,
+					Field: &Field{
+						Name: string(fieldKey),
+						Type: dataType,
+					},
+				})


If a write has two points that both have a new field (newField), but the two points have different types for newField, they would both end up in fieldsToCreate, correct? Would that create issues later?

The old code had that same pattern. The first gets created, the second rejected.

gwossum · 2025-02-11T23:27:09Z

tsdb/shard.go

+			mf.mu.Lock()
+			defer mf.mu.Unlock()
+			// Check with the field validator.
+			if err := ValidateFields(mf, p, s.options.Config.SkipFieldSizeValidation); err != nil {


ValidateFields knows which fields are not currently in the measurement fields, and this is the only place in the code it is called from. We could avoid iterating over the fields again below looking for unknown fields if ValidateFields collected them for us.

I agree. My question was how wide-ranging the changes should be. This PR is the minimal set of changes I could find that fixed the bug, but you are correct that there is much room for improvement in the code as it is in the product and in this PR. Let's discuss how radical to get.

I'd like to replace the atomic.Value storing the fields map with a sync.Map with a generic wrapper for type safety, for instance. Lots of locking goes away if we do that, but it's a big, scary change.

gwossum · 2025-02-11T23:38:37Z

tsdb/shard.go

+			mf := engine.MeasurementFields(name)
+			mf.mu.Lock()


It feels like there is still a potential race condition. We lock the mf while looking up fields here, but then unlock it while we continue to the next point. Another incoming write in a different goroutine could then look up fields in mf before this goroutine can create the new fields. Or am I missing something?

The field creation has its own locking and checks yet again for field type conflicts. So either of the go routines may win, and the other will report an error.

So, the race you describe is real, but gets sequenced in field creation.

fix: lock MeasurementFields while validating

46cde8b

There was a window where a race between writes with differing types for the same field were being validated. Lock the MeasurementFields struct during field validation to avoid this. closes #23756

davidby-influx added area/tsm kind/bug 1.x team/edge labels Feb 11, 2025

davidby-influx requested review from gwossum and devanbenz February 11, 2025 22:04

davidby-influx self-assigned this Feb 11, 2025

gwossum reviewed Feb 11, 2025

View reviewed changes

fix: remove unnecessary return variable

fab80c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: lock MeasurementFields while validating #25998

fix: lock MeasurementFields while validating #25998

davidby-influx commented Feb 11, 2025

gwossum Feb 11, 2025

gwossum Feb 11, 2025

davidby-influx Feb 11, 2025

gwossum Feb 11, 2025

davidby-influx Feb 11, 2025

davidby-influx Feb 11, 2025

gwossum Feb 11, 2025

davidby-influx Feb 11, 2025

fix: lock MeasurementFields while validating #25998

Are you sure you want to change the base?

fix: lock MeasurementFields while validating #25998

Conversation

davidby-influx commented Feb 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment