Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heatmap charts in Netdata #265

Open
8 of 10 tasks
hugovalente-pm opened this issue Feb 9, 2022 · 25 comments
Open
8 of 10 tasks

Heatmap charts in Netdata #265

hugovalente-pm opened this issue Feb 9, 2022 · 25 comments

Comments

@hugovalente-pm
Copy link
Contributor

hugovalente-pm commented Feb 9, 2022

Goal

The need come for the representation of latency charts which now are rendered as a stacked chart like this:
image

In the above charts, there are buckets of latencies as dimensions. These buckets are hard-coded in the data collector. This is a real example of a data query:

json payload for latency filesystem.ext4_write_latency chart
{
   "api": 1,
   "id": "filesystem.ext4_write_latency",
   "name": "filesystem.ext4_write_latency",
   "view_update_every": 5,
   "update_every": 5,
   "first_entry": 1644602900,
   "last_entry": 1644602950,
   "before": 1644602950,
   "after": 1644602905,
   "dimension_names": ["0us->1us", "1us->2us", "2us->4us", "4us->8us", "8us->16us", "16us->32us", "32us->64us", "256us->512us"],
   "dimension_ids": ["0us->1us", "1us->2us", "2us->4us", "4us->8us", "8us->16us", "16us->32us", "32us->64us", "256us->512us"],
   "latest_values": [0.0739187, 0.2739483, 0.2000296, 0.569623, 1.7698006, 0.6522811, 0, 0],
   "view_latest_values": [0.0739187, 0.2739483, 0.2000296, 0.569623, 1.769801, 0.6522811, 0, 0],
   "dimensions": 8,
   "points": 10,
   "format": "json",
   "result": {
 "labels": ["time", "0us->1us", "1us->2us", "2us->4us", "4us->8us", "8us->16us", "16us->32us", "32us->64us", "256us->512us"],
    "data":
 [
      [ 1644602905000, 2385.544, 521.8245, 78.06823, 20.11922, 16.152047, 1.983585, 0.2833692, 0.1416846],
      [ 1644602910000, 1005.6769, 219.691, 32.25781, 8.280783, 7.026062, 0.8164153, 0.1166308, 0.0583154],
      [ 1644602915000, 108.25282, 43.08954, 2.594651, 1.764481, 2.742578, 0.5041375, 0, 0],
      [ 1644602920000, 55.32629, 23.59504, 1.605293, 1.4134601, 1.731273, 0.4218429, 0.1259804, 0],
      [ 1644602925000, 4.663792, 2.39492, 2.216843, 0.9783493, 1.0303782, 0.4521649, 0.0740196, 0.1260484],
      [ 1644602930000, 2.736208, 1.5311441, 1.2571766, 0.5697738, 0.7697898, 0.2218547, 0, 0.0739516],
      [ 1644602935000, 0.1259911, 0.5778999, 0.1259911, 0.4519088, 0.9038177, 0, 0, 0],
      [ 1644602940000, 0.0740089, 0.2960356, 0.2000805, 0.3480983, 0.4440535, 0.1260716, 0.1260716, 0],
      [ 1644602945000, 0.1260813, 0.2521627, 0.2000097, 0.8304164, 1.512976, 0.326091, 0.0739284, 0],
      [ 1644602950000, 0.0739187, 0.2739483, 0.2000296, 0.569623, 1.769801, 0.6522811, 0, 0]
  ]
},
 "min": 0,
 "max": 2385.544
}

We need to render these data in a different way, to make it better for users see a heatmap of the latencies. For example, this is what Grafana is doing:

image

In the above chart:

  • Every dimension (time bucket) is now having its own row at the y-axis So, the y-axis is no longer used for the value of each point. It just allocates space for each dimension.
  • The colour intensity of each point is the value. And the exact colour used is based on the color scale below the chart.
  • Each point now has a height, so instead of being rendered as "point" it is a small vertical line.
  • The colour scale is fixed, but the values that are assigned to each colour, depend on the min and max value available for all dimensions in the given timeframe (the dataset we received). In the Grafana example above, min = 0, while max = 0.38.

The tooltip of the chart on Grafana has a small histogram, showing all the value of all dimensions for a given time (x-axis). In our case, we could keep the tooltip we already have:

  • don't sort the dimensions and
  • add a histogram (like Grafana does) above a list of dimensions.

This new type of charts should be available on our Netdata Cloud charts for any chart and the above logic should apply as well:

  • dimensions are converted to y-axis rows
  • values are distributed on a colour scale
  • each colour intensity of each point is the value

THIS IS FOR NETDATA CLOUD
Agent dashboard will fallback to stacked charts.


Tasks to complete

FE

Cloud BE

  • change the Cloud BE API that is consumed by Cloud FE to support a new chart type (this new ticket on this repo)
  • update swagger specs on Cloud BE for the API is consumed by the Cloud FE to have available this new chart type heatmap option (this new ticket on this repo)

Netdata Agent

Documentation

@ktsaou
Copy link
Member

ktsaou commented Feb 9, 2022

This is great!

You have to think of 2 additional issues:

  1. You need on the new charts to allow turning any chart into heatmap. So, charts 2.0 on cloud need to have this option.
  2. Chart types are defined (per chart) at the data collector and are propagated to the UI. So, a new chart type should be created in netdata (now it only has line, area, stacked) called heatmap and allow it to reach the dashboard. For the old agent dashboard this should fallback to area.

@hugovalente-pm
Copy link
Contributor Author

@ktsaou thanks for the inputs. please see my comments below:

  1. You need on the new charts to allow turning any chart into heatmap. So, charts 2.0 on cloud need to have this option.

will specify this on the requirements above

  1. Chart types are defined (per chart) at the data collector and are propagated to the UI. So, a new chart type should be created in netdata (now it only has line, area, stacked) called heatmap and allow it to reach the dashboard. For the old agent dashboard this should fallback to area.

For the old agent dashboard will open a issue on netdata/dashboard and link it here.

For the data collectors we probably need inputs from @ilyam8 and/or @thiagoftsm
Guys, what are the changes we need to do on collectors or other things on the Agent?

@hugovalente-pm
Copy link
Contributor Author

@ktsaou I mentioned "will add it" not that I had already added it 😅
the ticket is now updated 👍

@thiagoftsm
Copy link

Guys, what are the changes we need to do on collectors or other things on the Agent?

@hugovalente-pm when we create charts, we give as argument the chart type, so we will need to change the collectors to send heatmap, area or any other type. In netdata core we will need to add the new chart types for the agent to understand what we are sending.

I suggest we sync cloud and agent dashboard for we release the new feature together. I think we can give a bad user experience if they see on cloud the heatmap and stacked chart on agent.

@ktsaou
Copy link
Member

ktsaou commented Feb 9, 2022

@thiagoftsm unfortunately the agent shipped dashboard will stay behind. The agent will show stacked charts, until we update its entire visualization library to the one used by the cloud. It seems quite a lot of work to do currently. It will eventually be missing a lot of features. The Anomaly Advisor, metrics correlations, etc.

We are working on a feature to mark the agent dashboard as old and the cloud one as new, so all agent dashboard users will be advised to use the new and only fallback to old when the agent is not connected to cloud.

We are also investigating a way of bringing the cloud dashboard on the agent, without actually refactoring it. We will see if that works somehow.

For the moment, let the agent shipped dashboard stay behind. We will deal with this later.

@thiagoftsm
Copy link

Thanks @ktsaou, I did not know all these details.

@hugovalente-pm I suggest we schedule a meeting to understand better how we can address these different behaviors. Is the cloud storing in its own database the user preferences for charts? If it is doing this, we can give to you a list of charts that will use heatmaps by default.

@ktsaou
Copy link
Member

ktsaou commented Feb 10, 2022

Is the cloud storing in its own database the user preferences for charts

A user can change the type of a chart, but the default is still controlled by the data collector. So, until we move this setting away from the collector, it is the only place to do it. This means that the plugins.d protocol, and netdata internal RRDSET structures have to support a new chart type: heatmap. And of course all the documentation to be updated.

we can give to you a list of charts that will use heatmaps

Everything that uses time buckets should be turned by default to heatmap, by changing its chart type to heatmap at the data collector. Yes, please come up with this list, so that we can update the ticket and know what needs to be done.

There are a few collectors, like the web_log that present latency as max and avg, statsd timers that present latency in similar ways, fping (this needs a PR to the fping repo) and possibly more. We should modify these collectors to use time buckets and create heatmap charts too.

Generally we should use heatmap charts when:

  • The collector collects time buckets already
  • The collector collects individual events (web log responses each with a duration, statsd timer events each with each own duration, fping pings each with its own duration), which can be allocated into fixed time buckets, instead of just doing a min, max, average, etc on them.

how we can address these different behaviors

What do you mean? which behaviors?

@ktsaou ktsaou changed the title Heatmap - chart visualization options Heatmap charts in Netdata Feb 10, 2022
@hugovalente-pm
Copy link
Contributor Author

hugovalente-pm commented Feb 10, 2022

thanks @thiagoftsm for all the details, as @ktsaou said, we will have for now the Agent dashboard falling back to stacked charts when a collector has defined that a given chart should be a heatmap

to summarise behaviour:

  • Agent collector has an eBPF chart that is a latency chart and defines the chart type heatmap
  • Agent Dashboard doesn't support the heatmap so it will show the chart as a stacked chart
  • Netdata Cloud UI supports the heatmap and, with the metadata for the chart, it sees this is to be shown as a heatmap, so it displays the new chart type

at Netdata Cloud, for any chart, a user can override the "suggested" chart type. on Overview and Single Node view this is currently only stored for the present user session, not stored as user preferences. on Custom Dashboards users can define the chart type and this will be saved as part of the definition of the Custom Dashboard

to try to summarize what is needed for this feature, and in order to open respective tickets on other repos, we have:

  • make available new heatmap chart type on our charts library (this repo/ticket)
  • add the heatmap chart type as an option on Netdata Cloud chart options (this repo/ticket)
  • update swagger specs on Agent for the API is consumed by the BE to have available this new chart type heatmap option (this should be on netdata/netdata repo? will probably ask your help to do this @thiagoftsm )
  • change the collectors that currently support these buckets to define that for these charts that have time buckets the chart type is to be heatmap ([this should be on netdata/netdata repo? will probably ask your help to do this @thiagoftsm ) issue]([Feat]: Change charttype of the eBPF plugin charts to heatmap netdata#12925))
  • change the Cloud BE API that is consumed by Cloud FE to support a new chart type (this new ticket on this repo)
  • update swagger specs on Cloud BE for the API is consumed by the Cloud FE to have available this new chart type heatmap option (this new ticket on this repo)
  • update our documentation for (Documentation: Charts can be displayed as Heatmaps learn#937):
    • collectors that will be changed and inform that charts with time buckets will have these heatmap
    • any other relevant documentation that we specify chart types

@thiagoftsm
Copy link

What do you mean? which behaviors?

I have the concern we store heatmap and agent only can show stacked, while the cloud will show heatmap.

@ktsaou last time I talked with our designer I was informed that we could not change color scheme like we were expecting. For this specific feature we will need either to change either the JS files to understand what we will show when it receives a heatmap, or we will need to change our internal API to change heatmap to stack. Considering the previous experience, I understand that the internal API can be the simplest road, but I will have to confirm with our agent team the possible impact on ACLK. On the other hand, considering that we will classify our current dashboard as old, we could only add a JS object that has this simple association. I would need to talk with visualization team to verify that we won't have blockers here.

@ktsaou
Copy link
Member

ktsaou commented Feb 10, 2022

last time I talked with our designer I was informed that we could not change color scheme like we were expecting.

The colors can be changed even via dashabord_info.js for any context or any chart. I am not sure I understand your statement. It seems a wrong statement in my mind. Probably the designer stated that he does not want to change the colors, to have some uniformity. We can change them if we need to.

For this specific feature we will need either to change either the JS files to understand what we will show when it receives a heatmap,

In JS there a color array per dashboard theme (white, dark) for all chart types. Just one for all of them.
I understand that front-end engineers will have to add a second color array for heatmaps. So heatmaps will have their own coloring.

or we will need to change our internal API to change heatmap to stack.

I don't get this. What is "internal" in this statement? The agent?
The only change in the agent is to support another chart type. It already supports 3, it will now become 4. The agent does not really care what happens with them. It is just a flag, a label that follows every chart.

Considering the previous experience, I understand that the internal API can be the simplest road, but I will have to confirm with our agent team the possible impact on ACLK.

Still I don't get. How a simple flag can impact ACLK? Today charts define themselves as line, area, or stacked. Now there will be another one heatmap. Why this could affect ACLK?

On the other hand, considering that we will classify our current dashboard as old, we could only add a JS object that has this simple association.

This is totally irrelevant. Even the old dashboard has to map the new chart type to an existing one (heatmap is rendered as stacked). This mapping should not happen at the agent. It is the responsibility of the front-end to deal with it.

I would need to talk with visualization team to verify that we won't have blockers here.

They are here. Ask what you need to know. Avoid meetings please.

@ktsaou
Copy link
Member

ktsaou commented Feb 10, 2022

I made many updates to my comment above. So please refresh.

@thiagoftsm
Copy link

The colors can be changed even via dashabord_info.js for any context or any chart.

I agree, but the old designer said we have a color scheme for dark theme and another completely different for the white theme, so we could not only change the color. This was the motive I made that issue we closed after to create this.

I don't get this. What is "internal" in this statement? The agent?

The ACLK change a lot, probably I am having in my mind the first scheme that we made. Unless I am wrong, cloud was using api/v1/charts to get the data. If this is not happening, please, ignore what I wrote about to change our API.

They are here. Ask what you need to know. Avoid meetings please.

All right. We will do. 🤝

@hugovalente-pm are we already moving ahead with the two first bullets from this comment? Are we going to convert this issue to epic and create an issue for each one of the bullets?

@ktsaou
Copy link
Member

ktsaou commented Feb 10, 2022

I agree, but the old designer said we have a color scheme for dark theme and another completely different for the white theme, so we could not only change the color. This was the motive I made that issue we closed after to create this.

Anyway, he was wrong. We can do whatever we want with colors.

@hugovalente-pm
Copy link
Contributor Author

@hugovalente-pm are we already moving ahead with the two first bullets from #265 (comment) comment? Are we going to convert this issue to epic and create an issue for each one of the bullets?

@thiagoftsm this is the issue for Cloud FE (just noticed now that I had forgotten to put the label cloud-frontend) from where we drove the discussion and identified the needed changes and from where we need to link to the other tickets - we were doing this as mentions

image

you mention about "convert this issue to epic", what would be the difference?

Btw, there is main umbrella issue on netdata/product#1795

@thiagoftsm
Copy link

you mention about "convert this issue to epic", what would be the difference?

This is the way I work with @cpipilas, when we have different bullets we convert the issue for epic and after this we create a new issue for each one of the bullets. This way we can monitor the progress step-by-step. If you are not working like this, no problem I can adapt myself and write the requirements for front end here.

@hugovalente-pm
Copy link
Contributor Author

got it @thiagoftsm , we can convert this to an EPIC if it makes easier for tracking

@hugovalente-pm
Copy link
Contributor Author

@novykh please also add this on your list for review to see if details are ok to push to FE backlog

@ktsaou
Copy link
Member

ktsaou commented Mar 16, 2022

Charts that could be heatmaps:

  1. idlejitter charts
  2. fping
  3. statsd metrics (TBD which ones)

@hugovalente-pm
Copy link
Contributor Author

hugovalente-pm commented May 16, 2022

we agreed to try to start this task this week, considering this comment as the summary of the bullets that we will need some work

@novykh @jjtsou for Cloud FE

  • make available new heatmap chart type on our charts library (this repo/ticket)
  • add the heatmap chart type as an option on Netdata Cloud chart options (this repo/ticket)

@TonyPath for Cloud BE there are these two bullets that we will need some work

  • change the Cloud BE API that is consumed by Cloud FE to support a new chart type (this new ticket on this repo)
  • update swagger specs on Cloud BE for the API is consumed by the Cloud FE to have available this new chart type heatmap option (this new ticket on this repo)

@thiagoftsm for Agent we have the tickets that you created already but @ktsaou had also mentioned these below, do we need a ticket for those?

  • idlejitter charts
  • fping
  • statsd metrics (TBD which ones)

@DShreve2 this is the ticket that Tina had created in the past netdata/learn#937

@thiagoftsm
Copy link

Hello @hugovalente-pm ,

I think it will be better for the product team to monitor the tasks if we have tickets for each one of the plugins you wrote. This will also help us to split work between developers.

Best regards!

@hugovalente-pm
Copy link
Contributor Author

hugovalente-pm commented May 16, 2022

thanks @thiagoftsm I hadn't found those tickets hence my question :)

I've moved the previous ones you created, which were under netdata/netdata-cloud to netdata/netdata and created the following ones (using yours on eBFP and Python as a template)

to measure expectations, the heatmap chartype implementation FE-wise is aimed to be done on Cloud, on the agent dashboard these will fallback to a stacked chart - we still need the collectors to flag these charts as heatmaps so on Cloud they will be properly shown.

@hugovalente-pm
Copy link
Contributor Author

@novykh, on a discussion with @amalkov it seemed to make sense for this one to be @MichaelGamel's next task
before pushing this forward it would probably make sense to re-assess the estimate on this one

@novykh
Copy link
Member

novykh commented Aug 1, 2022

There is an ongoing integration with uPlot library on charts repo. This library has a heatmap chart type.

@hugovalente-pm
Copy link
Contributor Author

ok, cool @novykh please share once you have some more insights into that
curious to understand if it will be simpler to implement what we have on this user story

@hugovalente-pm
Copy link
Contributor Author

@ktsaou from my understanding the work on the agent will only be done once we have the Cloud UI as the Agent dashboard, right?

example:

@hugovalente-pm hugovalente-pm removed this from the [2023] Summer milestone Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants