You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BufferedTokenizerExt throw an exception when the token discovered is bigger than sizeLimit parameter. However, given the existing implementation the check is executed only on the first token present in the input fragment, this means that if it's the second token the one that exceed no error is raised:
While the implementation could be considered buggy on this aspect, it can be avoided selecting a sizeLimit which is bigger than length of input fragment. This is related to the context where the tokenizer is used, considering the actual code base it's used with sizeLimit only in json_lines codec:
To grab a more practical view of this issue check #16968 (comment)
Ideal solution
To solve this problem, the BufferedTokenizer 's extract method should return an iterator and not array (or list). The iterator should apply the boundary check on each next invocation.
The text was updated successfully, but these errors were encountered:
BufferedTokenizerExt
throw an exception when the token discovered is bigger thansizeLimit
parameter. However, given the existing implementation the check is executed only on the first token present in the input fragment, this means that if it's the second token the one that exceed no error is raised:logstash/logstash-core/src/main/java/org/logstash/common/BufferedTokenizerExt.java
Lines 85 to 88 in 32cc85b
While the implementation could be considered buggy on this aspect, it can be avoided selecting a
sizeLimit
which is bigger than length of input fragment. This is related to the context where the tokenizer is used, considering the actual code base it's used withsizeLimit
only in json_lines codec:https://github.com/logstash-plugins/logstash-codec-json_lines/blob/f4e4e004a30bad731826cdb10f94f012c1ad28d8/lib/logstash/codecs/json_lines.rb#L63
This means that problem appear depending in which input the codec is used.
If used with TCP input https://github.com/logstash-plugins/logstash-input-tcp/blob/e5ef98f781ab921b6a1ef3bb1095d597e409ea86/lib/logstash/inputs/tcp.rb#L215 the
decode_buffer
uses the codec passing the buffer read from socket, which for TCP could be a fragment of 64Kb.To grab a more practical view of this issue check #16968 (comment)
Ideal solution
To solve this problem, the BufferedTokenizer 's extract method should return an iterator and not array (or list). The iterator should apply the boundary check on each next invocation.
The text was updated successfully, but these errors were encountered: