Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when closing the embedded etcd #19172

Closed
ahrtr opened this issue Jan 11, 2025 · 3 comments · Fixed by #19221 or #19257
Closed

Race condition when closing the embedded etcd #19172

ahrtr opened this issue Jan 11, 2025 · 3 comments · Fixed by #19221 or #19257

Comments

@ahrtr
Copy link
Member

ahrtr commented Jan 11, 2025

Which Github Action / Prow Jobs are flaking?

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/etcd-io_etcd/19168/pull-etcd-integration-4-cpu-amd64/1878113519820345344

=== FAIL: integration/embed TestEmbedEtcd (2.17s)
==================
WARNING: DATA RACE
Write at 0x00c000346760 by goroutine 132:
  runtime.racewrite()
      <autogenerated>:1 +0x1e
  go.etcd.io/etcd/server/v3/embed.(*Etcd).Close()
      /home/prow/go/src/github.com/etcd-io/etcd/server/embed/etcd.go:460 +0xddc
  go.etcd.io/etcd/tests/v3/integration/embed_test.TestEmbedEtcd()
      /home/prow/go/src/github.com/etcd-io/etcd/tests/integration/embed/embed_test.go:120 +0x14d1
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:[169](https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/etcd-io_etcd/19168/pull-etcd-integration-4-cpu-amd64/1878113519820345344#1:build-log.txt%3A169)0 +0x226
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1743 +0x44
Previous read at 0x00c000346760 by goroutine 282:
  runtime.raceread()
      <autogenerated>:1 +0x1e
  go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler()
      /home/prow/go/src/github.com/etcd-io/etcd/server/embed/etcd.go:875 +0x6a
  go.etcd.io/etcd/server/v3/embed.(*Etcd).servePeers.func3()
25 skipped lines...
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1690 +0x226
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:[174](https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/etcd-io_etcd/19168/pull-etcd-integration-4-cpu-amd64/1878113519820345344#1:build-log.txt%3A174)3 +0x44
==================
    testing.go:1399: race detected during execution of test

Which tests are flaking?

.

Github Action / Prow Job link

No response

Reason for failure (if possible)

No response

Anything else we need to know?

No response

@joshuazh-x
Copy link
Contributor

joshuazh-x commented Jan 13, 2025

This should be introduced by #19139 where waitgroup.Add is put in goroutines hence it could be called before waitgroup.Close().

@ahrtr
Copy link
Member Author

ahrtr commented Jan 13, 2025

This should be introduced by #19139 where waitgroup.Add is put in goroutines hence it could be called before waitgroup.Close().

Could you deliver a PR? thx

@joshuazh-x
Copy link
Contributor

The issue comes from a race condition when e.wg.Add(1) is called after we.g.Wait() is released.

PR #19205 is implemented to fix this issue.

However, theoretically, this race condition could still arise under specific circumstances. . So I provide a backup PR #19206 here, though I believe it shall be pretty rare in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment