-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Goroutine/Memory leak in Loki #311
Comments
Another thing that we've noticed was that the number of goroutines increased every 5 minutes, every time by ~100. |
Hola! After lots of debugging I narrowed it down to
Looks like setting the mapping keys explicitly was the key factor here. Then I've set |
This issue has not had any activity in the past 30 days, so the |
Hey there @nickelghost, @sgrzemski apologies for the belated response, most of the team was out during the holidays. Just to make sure, have you two been working together on this? Is the experiment by @sgrzemski applicable to the use case and setup of the first report by @nickelghost, or entirely different? |
Hi @tpaschalis No worries. Yes indeed, this is the same use case, we're working together. Cheers |
Hi there 👋 On April 9, 2024, Grafana Labs announced Grafana Alloy, the spirital successor to Grafana Agent and the final form of Grafana Agent flow mode. As a result, Grafana Agent has been deprecated and will only be receiving bug and security fixes until its end-of-life around November 1, 2025. To make things easier for maintainers, we're in the process of migrating all issues tagged variant/flow to the Grafana Alloy repository to have a single home for tracking issues. This issue is likely something we'll want to address in both Grafana Alloy and Grafana Agent, so just because it's being moved doesn't mean we won't address the issue in Grafana Agent :) |
Hi, @nickelghost and @sgrzemski!👋 Apologies for the late response! I believe this memory leak is caused by two bugs. I opened two PRs to fix them - #1426 and #1431. |
Will this get into the grafana-agent as well? |
What's wrong?
We're getting constant memory leaks in our GA flow deployment. After attaching profiling, it looks like there is a goroutine leak in Loki processing, due to which the CRI config also leaks memory. We're getting Kubernetes pod logs inside a cluster. We've had larger leakage issues before - it turned out that it was due to trying to tail logs that didn't exist - scoping the Kubernetes discovery to the current node helped. We've noticed that if the leaks still occur, they happen mostly on dev environments - I suspect there might be a lack of cleanup of the targets once pods are deleted.
Although this might be a configuration error that still resolves log files that don't exist, it'd be great if GA handled them gracefully.
goroutine-pprof.pb.gz
inuse_space-pprof.pb.gz
Steps to reproduce
Apply the below config in a dynamic k8s environment and observe. It could take about 24h for the memory to accumulate significantly.
System information
GKE, amd64
Software version
0.37.2
Configuration
Logs
No response
The text was updated successfully, but these errors were encountered: