Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support of multiple kind of cache for relabeling components #1692

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pbailhache
Copy link

PR Description

Related to proposal introduced by #1600.

This is a (working) draft for this feature.

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

Copy link
Author

@pbailhache pbailhache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments to open discussion to improve this code

@@ -80,7 +80,6 @@ func (s *server) Run(ctx context.Context) error {
})

mw := middleware.Instrument{
RouteMatcher: r,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the impact of this change (This is related to the update of dskit : grafana/dskit@27d7d41)

return fmt.Errorf("cache_size must be greater than 0 and is %d", arg.CacheConfig.InMemory.CacheSize)
}
case cache.Memcached:
// todo
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to determine what to include here, maybe move this to the service/cache package ?

@@ -230,7 +264,13 @@ func (c *Component) Update(args component.Arguments) error {
defer c.mut.Unlock()

newArgs := args.(Arguments)
c.clearCache(newArgs.CacheSize)

// todo maybe recreate whole relabelCache here in case of change for redis/memcached client
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no clearCache for redis or memcached, I don't really knows what to do here with those kind of cache

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not supported by redis/memchached, can we simply return an error on call to clearCache for redis/memcached ? Or would it break some stuff ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a no-op, generally this is used to reset the cache size. In the case if redis/memcache I imagine this would not even be called. IE the cache size should not exist within Alloy.

// Ideally we should be using the dskit/cache conf directly, but it means it should not
// be into the alloy configuration ?

type RedisConf struct {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained in the comment, I'm open to suggestion here on how to handle the config part.

For now each cache is configured at the relabel component level but this include copying the struct to add the alloy tags as we cannot embed the dskit/cache configs into the grafana alloy config.

If we decide to move this config outside of the alloy config we could directly use the dskit/cache config.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a re-implementation of the LRU cached that was present in the prometheus relabel.

}

func newMemcachedCache[valueType any](cfg MemcachedConfig) (*MemcachedCache[valueType], error) {
client, err := cache.NewMemcachedClientWithConfig(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to add here (same in the other implementation)

func (c *MemcachedCache[valueType]) Remove(key string) {
ctx := context.Background()
//TODO manage error
_ = c.client.Delete(ctx, key)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ignore error for now but this isn't ideal

}

func newRedisCache[valueType any](cfg RedisConf) (*RedisCache[valueType], error) {
client, err := cache.NewRedisClient(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to add here (same in the other implementation)


func (c *RedisCache[valueType]) Remove(key string) {
ctx := context.Background()
//TODO manage error
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring error is never ideal

internal/service/cache/cache.go Show resolved Hide resolved
"sync"
"time"

lru "github.com/hashicorp/golang-lru/v2"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there also an lru cache interface in dskit/cache that we should use instead ?

)

type InMemoryCache[valueType any] struct {
lru *lru.Cache[string, *valueType]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we worry about cache strategy, or is it out of scope of this PR ?

Copy link
Author

@pbailhache pbailhache Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous cache was LRU so I kept it this way, I think it's out of scope

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LRU should be kept as is, if we want to change that behavior should be a separate PR.

found := false
values[key], found = c.lru.Get(key)
if !found {
return nil, errNotFound

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err := encoder.Encode(*value); err != nil {
return err
}
c.client.SetAsync(key, indexBuffer.Bytes(), ttl)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it weird to have no way of being notified of an error here (and in SetMulti), but it seems to be how asyncQueue works, so not much to be done here...

@@ -230,7 +264,13 @@ func (c *Component) Update(args component.Arguments) error {
defer c.mut.Unlock()

newArgs := args.(Arguments)
c.clearCache(newArgs.CacheSize)

// todo maybe recreate whole relabelCache here in case of change for redis/memcached client

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not supported by redis/memchached, can we simply return an error on call to clearCache for redis/memcached ? Or would it break some stuff ?

DB int `alloy:"db,attr"`

// MaxAsyncConcurrency specifies the maximum number of SetAsync goroutines.
MaxAsyncConcurrency int `yaml:"max_async_concurrency" category:"advanced"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there some decent default values that we can chose for this and all the *BufferSize ?

internal/service/cache/cache.go Show resolved Hide resolved
internal/service/cache/cache.go Show resolved Hide resolved
Password: flagext.SecretWithValue(""),
MaxAsyncConcurrency: cfg.MaxAsyncConcurrency,
MaxAsyncBufferSize: cfg.MaxAsyncBufferSize,
DB: 0,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not set DB to cfg.DB here ? Default value would still be 0 and it would be possible to configure

internal/service/cache/cache.go Show resolved Hide resolved
CacheConfig: cache.CacheConfig{
Backend: cache.InMemory,
InMemory: cache.InMemoryCacheConfig{
CacheSize: 100_100,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here?

@mattdurham
Copy link
Collaborator

Before we get to far down this path, I have a few longer term concerns about using this while we still have the handling of a single metric at a time. In an ideal world the Appender a batch based interface so we can batch the cache requests but thats a bigger lift.

@wilfriedroset
Copy link

Thank you @mattdurham for your review and feedbacks. regarding your concerns

about using this while we still have the handling of a single metric at a time. In an ideal world the Appender a batch based interface so we can batch the cache requests but thats a bigger lift.

Do you think it would be possible to split the two needs? First we introduce the external caches via this PR, then after we take time to design and implement the batching to optimize further the performance.

As you have pointed, batching the requests will require more work.
The reason why I'm suggesting that is because the current inmemory cache is the default and will continue to work as it is now with the same behavior/performance.
The performance improvement added by the refactoring will benefit the external caches only so there is no risks.
When I'm saying later, this is something we are willing to contribute but we might require help due to the complexity.

WDYT?

@mattdurham
Copy link
Collaborator

Really want to see some benchmarks with using redis/memcache, use something like https://golang.testcontainers.org/modules/redis/ with relabel and a few thousand signals.

@pbailhache
Copy link
Author

I made a bench branch, you can see the commit here :
pbailhache@95098e0

Here are the result of the benchmark on my laptop, running with dockertest.
benchmark

It's a bit better with a running docker container outside of the benchmark (around 65000ns/op for redis).
I used this benchmark to optimize the encode/decode for redis & memcached, switching from gob to json as it a bit quicker for small structs.

Here is the 3 flamegraphs for each backend, as we can see, the Get calls to the redis/memcached are what makes it slow.
LRU :
LRU

Redis
Redis

Memcached
Memcached

Here I'm still using the deskit/cache package, I chose to use it because I did not want to re-implement the whole cache clients system. Do you think it's better to not use it here, I could implement a simpler version of each client and bench that, but I don't think the difference will be huge.

@mattdurham
Copy link
Collaborator

Actually we are already using some dskit structs in a few places, will review your imports. We recently found an issue with exemplars that will likely require changes to the appender/appender interface that will batch the samples which should significantly improve the viability of this. I would hold off until we get that out, which will likely be in two weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants