Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BufferedTokenizerExt applies sizeLimit check only of first token of input fragment #17017

Open
andsel opened this issue Feb 5, 2025 · 0 comments

Comments

@andsel
Copy link
Contributor

andsel commented Feb 5, 2025

BufferedTokenizerExt throw an exception when the token discovered is bigger than sizeLimit parameter. However, given the existing implementation the check is executed only on the first token present in the input fragment, this means that if it's the second token the one that exceed no error is raised:

final int entitiesSize = ((RubyString) entities.first()).size();
if (inputSize + entitiesSize > sizeLimit) {
throw new IllegalStateException("input buffer full");
}

While the implementation could be considered buggy on this aspect, it can be avoided selecting a sizeLimit which is bigger than length of input fragment. This is related to the context where the tokenizer is used, considering the actual code base it's used with sizeLimit only in json_lines codec:

https://github.com/logstash-plugins/logstash-codec-json_lines/blob/f4e4e004a30bad731826cdb10f94f012c1ad28d8/lib/logstash/codecs/json_lines.rb#L63

This means that problem appear depending in which input the codec is used.
If used with TCP input https://github.com/logstash-plugins/logstash-input-tcp/blob/e5ef98f781ab921b6a1ef3bb1095d597e409ea86/lib/logstash/inputs/tcp.rb#L215 the decode_buffer uses the codec passing the buffer read from socket, which for TCP could be a fragment of 64Kb.

To grab a more practical view of this issue check #16968 (comment)

Ideal solution

To solve this problem, the BufferedTokenizer 's extract method should return an iterator and not array (or list). The iterator should apply the boundary check on each next invocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant