-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrapper around jemalloc to track allocator usage by thread #4336
base: master
Are you sure you want to change the base?
Conversation
0ec36fd
to
5a8f13a
Compare
b2ac2f7
to
6df862a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like it's going to be a big help! just a few comments/questions. Thank you!
73161e0
to
e1e6c16
Compare
"solGossipWork", | ||
"solGossip", | ||
"solRepair", | ||
"FetchStage", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are you finding these thread names? I can't seem to find a few of them? EDIT: idk why it selected 4 lines. Meant to just select FetchStage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are from my experiments with thread manager. Quite possible that some of those I got wrong and/or missed some. Listing all threads in agave is pretty much impossible in its current form. But this is no big deal as long as we get the main pools right. Arguably, we could cut this list down to top 10 and it would be equally useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya agree that we don't need to capture every single thread. But I do want to make sure the ones that we have hard coded here are actual thread names. when I search in the codebase for threads names like FetchStage
or solClusterInfo
I don't find any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I've double-checked, now all names should be legit.
623b79c
to
f6f7715
Compare
ok so maybe i don't fully understand how thread names work in rust. I know we define thread names in the code base like: Lines 227 to 235 in 85e8f86
But in the PR, you have thread names like: |
4bbb420
to
ed262a7
Compare
No magic transform, just got the grep script wrong. 2 slipped through. |
Co-authored-by: Greg Cusack <[email protected]>
ed262a7
to
a3b902f
Compare
a simple wrapper around jemalloc to track allocator usage by thread name in metrics.
Idea is to get a better idea why node crashes when OOM occurs (at least which threads were allocating memory).
This is for dev use only.
Problem
If/when agave starts leaking memory (or just clogging up some channel) it may be tricky to find where memory allocations are happening that cause the crash. Tracking per-pool allocations is not a replacement for valgrind, but has the advantage of fairly small overhead & integration into metrics.
Summary of Changes
Added feature-flag gated custom wrapper around jemalloc that tracks memory usage, grouped by thread pool name.