-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: lock MeasurementFields while validating #25998
base: master-1.x
Are you sure you want to change the base?
Conversation
There was a window where a race between writes with differing types for the same field were being validated. Lock the MeasurementFields struct during field validation to avoid this. closes #23756
tsdb/shard.go
Outdated
case PartialWriteError: | ||
if reason == "" { | ||
reason = err.Reason | ||
cont, err := func(p models.Point, iter models.FieldIterator) (cont bool, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the only time we don't have cont == true
is where there is an error, and then the continue
is at the bottom of the loop. Is cont
really doing anything?
fieldsToCreate = append(fieldsToCreate, &FieldCreate{ | ||
Measurement: name, | ||
Field: &Field{ | ||
Name: string(fieldKey), | ||
Type: dataType, | ||
}, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a write has two points that both have a new field (newField
), but the two points have different types for newField
, they would both end up in fieldsToCreate
, correct? Would that create issues later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code had that same pattern. The first gets created, the second rejected.
mf.mu.Lock() | ||
defer mf.mu.Unlock() | ||
// Check with the field validator. | ||
if err := ValidateFields(mf, p, s.options.Config.SkipFieldSizeValidation); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValidateFields
knows which fields are not currently in the measurement fields, and this is the only place in the code it is called from. We could avoid iterating over the fields again below looking for unknown fields if ValidateFields
collected them for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. My question was how wide-ranging the changes should be. This PR is the minimal set of changes I could find that fixed the bug, but you are correct that there is much room for improvement in the code as it is in the product and in this PR. Let's discuss how radical to get.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to replace the atomic.Value
storing the fields map with a sync.Map
with a generic wrapper for type safety, for instance. Lots of locking goes away if we do that, but it's a big, scary change.
mf := engine.MeasurementFields(name) | ||
mf.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there is still a potential race condition. We lock the mf
while looking up fields here, but then unlock it while we continue to the next point. Another incoming write in a different goroutine could then look up fields in mf
before this goroutine can create the new fields. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field creation has its own locking and checks yet again for field type conflicts. So either of the go routines may win, and the other will report an error.
So, the race you describe is real, but gets sequenced in field creation.
There was a window where a race between writes with
differing types for the same field were being validated.
Lock the MeasurementFields struct during field
validation to avoid this.
closes #23756