Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discuss] accept @timestamp numeric value as epoch in event initialization #10369

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

colinsurprenant
Copy link
Contributor

While looking at a @timestamp issue in Event initialization I ran into this problem; numeric values in a @timestamp field at Event initialization are not supported.

$ bin/logstash -e 'input{stdin {codec => json_lines}} output { stdout {codec => rubydebug}}'
...
{"a":1, "@timestamp":1483239600}

[2019-01-31T10:13:13,217][WARN ][org.logstash.Event       ] Unrecognized @timestamp value type=class org.jruby.RubyFixnum
{
           "host" => "mbp15r",
    "_@timestamp" => 1483239600,
              "a" => 1,
       "@version" => "1",
     "@timestamp" => 2019-01-31T15:13:13.223Z,
           "tags" => [
        [0] "_timestampparsefailure"
    ]
}

It seems fair to assume that a numeric value is in fact an epoch value. After this patch this now works:

bin/logstash -e 'input{stdin {codec => json_lines}} output { stdout {codec => rubydebug}}'
...
{"a":1, "@timestamp":1483239600}

{
          "host" => "mbp15r",
    "@timestamp" => 2017-01-01T03:00:00.000Z,
             "a" => 1,
      "@version" => "1"
}

I haven't found much specs around the accepted initialization values for the Event @timestamp field - unless there is objection to this patch I will add a better spec suite for that.

@colinsurprenant
Copy link
Contributor Author

In fact there was a single spec in legacy_ruby_event_spec.rb which tested for a numeric value as invalid, I removed that test. Again, if we think this PR is acceptable, I will add a more comprehensive spec for the Event @timestamp initialization behaviour.

@colinsurprenant
Copy link
Contributor Author

I found some history around this and the seconds vs milliseconds epoch values, see #2922

I think we could go one step further here and add logic to see if the numeric value is >= 100000000000 which represents GMT: Saturday, March 3, 1973 9:46:40 AM in milliseconds epoch. I think it would be a fairly safe assumption since very far away dates using seconds epoch would be allowed (99999999999 is GMT: Wednesday, November 16, 5138 9:46:39 AM) and the lowest milliseconds epoch with 100000000000 (GMT: Saturday, November 20, 2286 5:46:39 PM) would be close to the real epoch 0 at GMT: Thursday, January 1, 1970 12:00:00 AM.

@colinsurprenant colinsurprenant changed the title accept @timestamp numeric value as epoch in event initialization [discuss] accept @timestamp numeric value as epoch in event initialization Jan 31, 2019
@elasticsearch-bot elasticsearch-bot self-assigned this Jan 31, 2019
@yaauie
Copy link
Member

yaauie commented Jan 31, 2019

Historically, @timestamp has been a reserved, if poorly-documented field. It has accepted ISO8601-encoded strings, because that is a "normal" wire representation of the value it is designed to actually hold, but otherwise behaviour has been largely undefined.

I agree that we should make a best effort to interpret an inbound field called @timestamp, whose value "looks like" an epoch-timestamp of some kind. As you mention, magic values could be found and documented in such a way that we could handle seconds or millis (or even miros or nanos). These magic numbers would need to be clearly documented, and in an ideal world would also be tunable.

I would prefer a slightly more-conservative magic number of 2 x (10^10), which puts the cutoff-date in 2603, but keeps the lossy-date for millis closer to the epoch in 1970:

VALUE: 20000000000
  x(10^ 0): 2603-10-11T11:33:20+00:00
  x(10^-3): 1970-08-20T11:33:20.000+00:00
  x(10^-6): 1970-01-01T05:33:20.000000+00:00
  x(10^-9): 1970-01-01T00:00:20.000000000+00:00

SEE: timestamp-magic-number.rb


There are a couple of things we should also consider if we are opening up the acceptable input values for @timestamp field:

  • Should we handle string-encoded integer values as well? Many JSON libraries and implementations end up encoding large numbers as strings (Javascript has no official support for 64-bit integers, since all numbers are implemented on top of IEEE754 double-precision floating points; while it can reliably handle integers whose absolute value is less than ~ (2^53)-1, many libraries end up encoding numbers that overflow some arbitrary cutoff e.g., INT32MAX as strings).
  • Should we handle epoch-seconds with decimal/float representation (and if so, can we assume that these are fractional seconds and skip the magic number switching to fractional millis, etc.)?

@colinsurprenant
Copy link
Contributor Author

@yaauie The string case is a bit trickier to handle since we first need to assess that the string in question is a numeric representation possibly using a regex. I am not against the idea but not sure if we should necessarily have to explore this case if we think handling the simple numeric case is a step forward. Same for float representations. I came up with this because we had a recent concrete use-case. If this proposal which seems to me like a more common case makes sense we can move forward with it and create a new issues for the string handling proposals? WDYT?

@colinsurprenant
Copy link
Contributor Author

@yaauie

  • +1 on the 20000000000 threshold magic number.
  • I do not think making it configurable is worthy given the IMO very limited use cases it would be valuable.
  • For the other propositions to handle numeric string and decimal/float representations I think they are worth considering but not necessarily in this PR.
  • About the history of Timestamp and the lack of API docs, this is indeed a generalized problem in logstash. I will look but I am not sure where, for example, this magic number could be documented today so that it is actually findable for anyone actually wondering about that. Unless I am mistaken, I think a new docs section would be required for that, in which case we could look into making progress in that direction with @karenzone's help.

I added a commit for the magic number change and some specs. LMKWYT.

@colinsurprenant
Copy link
Contributor Author

Circling back - any concerns for not moving forward with this? /cc @jsvd

@jsvd
Copy link
Member

jsvd commented Mar 26, 2019

@colinsurprenant since @timestamp is mostly a field created by logstash and typically in iso8601, can you talk about about the scenarios where this is needed? overall I'm not against it and we can schedule for the next minor, but I'm still curious.

@colinsurprenant
Copy link
Contributor Author

colinsurprenant commented Mar 26, 2019

Found some more history in #2929 and #2998 - seems at that time we dropped that idea.

In any case, this PR is the result of investigating an issue where the input data had an epoch @timestamp. I understand that @timestamp is an internal LS field, but we can't prevent any other systems from deciding to use a @timestamp field in their data and other that ISO8601, an epoch numeric representation is the most common way of representing a timestamp.

Since we already try to some extend to coerce a source @timestamp field at Event creation, it seem a no brainer to support the epoch format.

@wescran
Copy link

wescran commented Nov 12, 2019

@jsvd We have a use case where this PR would be helpful. We are currently using Elastic Rally to benchmark dev environments and having the results stored in an elasticsearch instance in those environments. We then use logstash to process and ship those indices to our production ES instance. Rally stores the timestamp field as an epoch time and when using the Elasticsearch Input Plugin, it is unable to parse the timestamp resulting in us using the following workaround.

input {
    elasticsearch {
      hosts => #hostname
      index => "rally-*"
      query => '{ "query": { "match_all": { } } }'
    }
}
filter {
  if "_timestampparsefailure" in [tags] {
    date {
      match => [ "_@timestamp", "UNIX_MS" ]
      target => "timestamp"
      remove_field => "_@timestamp"
      remove_tag => "_timestampparsefailure"
    }
  }

It took me a while to find this discussion to help explain why my config was throwing this timestamp parse failure. This PR would be a great solution and allow us to stop using this workaround. Is there anything else blocking this PR from being applied? Thanks

@colinsurprenant
Copy link
Contributor Author

Jenkins test this

@colinsurprenant
Copy link
Contributor Author

@wescran nothing is blocking this other than reaching consensus on it. Will circle back. @yaauie @jsvd any objections with moving forward?

@colinsurprenant
Copy link
Contributor Author

The build errors look unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants