Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: lock MeasurementFields while validating #25998

Open
wants to merge 2 commits into
base: master-1.x
Choose a base branch
from

Conversation

davidby-influx
Copy link
Contributor

There was a window where a race between writes with
differing types for the same field were being validated.
Lock the MeasurementFields struct during field
validation to avoid this.

closes #23756

There was a window where a race between writes with
differing types for the same field were being validated.
Lock the  MeasurementFields struct during field
validation to avoid this.

closes #23756
tsdb/shard.go Outdated
case PartialWriteError:
if reason == "" {
reason = err.Reason
cont, err := func(p models.Point, iter models.FieldIterator) (cont bool, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the only time we don't have cont == true is where there is an error, and then the continue is at the bottom of the loop. Is cont really doing anything?

Comment on lines +747 to +753
fieldsToCreate = append(fieldsToCreate, &FieldCreate{
Measurement: name,
Field: &Field{
Name: string(fieldKey),
Type: dataType,
},
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a write has two points that both have a new field (newField), but the two points have different types for newField, they would both end up in fieldsToCreate, correct? Would that create issues later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code had that same pattern. The first gets created, the second rejected.

mf.mu.Lock()
defer mf.mu.Unlock()
// Check with the field validator.
if err := ValidateFields(mf, p, s.options.Config.SkipFieldSizeValidation); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValidateFields knows which fields are not currently in the measurement fields, and this is the only place in the code it is called from. We could avoid iterating over the fields again below looking for unknown fields if ValidateFields collected them for us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. My question was how wide-ranging the changes should be. This PR is the minimal set of changes I could find that fixed the bug, but you are correct that there is much room for improvement in the code as it is in the product and in this PR. Let's discuss how radical to get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to replace the atomic.Value storing the fields map with a sync.Map with a generic wrapper for type safety, for instance. Lots of locking goes away if we do that, but it's a big, scary change.

Comment on lines +707 to +708
mf := engine.MeasurementFields(name)
mf.mu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there is still a potential race condition. We lock the mf while looking up fields here, but then unlock it while we continue to the next point. Another incoming write in a different goroutine could then look up fields in mf before this goroutine can create the new fields. Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field creation has its own locking and checks yet again for field type conflicts. So either of the go routines may win, and the other will report an error.

So, the race you describe is real, but gets sequenced in field creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants