-
Notifications
You must be signed in to change notification settings - Fork 20
MungerData
The Munger::Data component is for dealing with tables of data – specifically, arrays of hashes or collections of hash like objects (ActiveRecords, for example). You feed all the raw data into it and transform and amend and “munge” it using the helpful methods so you have all the data points that you will eventually need for your report, completely disregarding layout or anything else – this tool is about getting the data you need.
The first thing that you want to do is initialize your Munger::Data object and load your data into it. For these examples, I will use data for an online advertising system that displays an ad for blocks of time on their site.
result = AdAirings.find(:all) data = Munger::Data.load_data(result)
You can also call it thusly:
data = Munger::Data.new(:data => result)
Now you have a Munger::Data object. Let’s see what we can do with it.
You can easily add new columns of data or transform existing columns of data. For example, let’s say that this was the raw data:
|airtime | airdate | clicks | advert | ------------------------------------------ |15 | 2008-01-01 | 301 | spot 1 | |30 | 2008-01-02 | 199 | spot 1 | |30 | 2008-01-03 | 234 | spot 1 | |15 | 2008-01-04 | 342 | spot 1 | |30 | 2008-01-01 | 172 | spot 2 | |15 | 2008-01-02 | 217 | spot 2 | |90 | 2008-01-03 | 1023 | spot 2 | |30 | 2008-01-04 | 321 | spot 2 | |60 | 2008-01-01 | 512 | spot 3 | |30 | 2008-01-02 | 813 | spot 3 | |15 | 2008-01-03 | 333 | spot 3 |
and we wanted to know what the click per second rate was. We can run this to add that column:
data.add_column('rate') { |row| (row.clicks / row.airtime) }
|airdate | rate | clicks | airtime | advert | ------------------------------------------------- |2008-01-01 | 20 | 301 | 15 | spot 1 | |2008-01-02 | 6 | 199 | 30 | spot 1 | |2008-01-03 | 7 | 234 | 30 | spot 1 | |2008-01-04 | 22 | 342 | 15 | spot 1 | |2008-01-01 | 5 | 172 | 30 | spot 2 | |2008-01-02 | 14 | 217 | 15 | spot 2 | |2008-01-03 | 11 | 1023 | 90 | spot 2 | |2008-01-04 | 10 | 321 | 30 | spot 2 | |2008-01-01 | 8 | 512 | 60 | spot 3 | |2008-01-02 | 27 | 813 | 30 | spot 3 | |2008-01-03 | 22 | 333 | 15 | spot 3 |
You can also do this to overwrite column data, and you can do it on more than one column at a time. Lets say we want to capitalize the names of the adverts and calculate the rate:
data.add_columns([:advert, :rate]) do |row| rate = (row.clicks / row.airtime) [row.advert.capitalize, rate] end
Now you have this :
|airdate | advert | rate | clicks | airtime |
-———————————————————————
|2008-01-01 | Spot 1 | 20 | 301 | 15 |
|2008-01-02 | Spot 1 | 6 | 199 | 30 |
|2008-01-03 | Spot 1 | 7 | 234 | 30 |
|2008-01-04 | Spot 1 | 22 | 342 | 15 |
|2008-01-01 | Spot 2 | 5 | 172 | 30 |
|2008-01-02 | Spot 2 | 14 | 217 | 15 |
|2008-01-03 | Spot 2 | 11 | 1023 | 90 |
|2008-01-04 | Spot 2 | 10 | 321 | 30 |
|2008-01-01 | Spot 3 | 8 | 512 | 60 |
|2008-01-02 | Spot 3 | 27 | 813 | 30 |
|2008-01-03 | Spot 3 | 22 | 333 | 15 |
Notice that the ‘advert’ field was overwritten with a new value.
Let’s say we only want to report on data where the click rate is more than 10/second. We can filter our dataset down easily. (Many times this is easier to do in SQL, if that’s where the data is coming from, but sometimes the calculations are complex…)
data.filter_rows { |row| row.rate > 10 }
So now we have this:
|advert | rate | airdate | clicks | airtime | ------------------------------------------------- |Spot 1 | 20 | 2008-01-01 | 301 | 15 | |Spot 1 | 22 | 2008-01-04 | 342 | 15 | |Spot 2 | 14 | 2008-01-02 | 217 | 15 | |Spot 2 | 11 | 2008-01-03 | 1023 | 90 | |Spot 3 | 27 | 2008-01-02 | 813 | 30 | |Spot 3 | 22 | 2008-01-03 | 333 | 15 |
Let’s say we want to see a table of spot clicks by airdate in a pivot table. If we take the original dataset and run this:
new_columns = data.pivot('airdate', 'advert', 'clicks')
we get ‘data’ modified to be the following (and new_columns contains the names of the columns that were added):
|advert | 2008-01-01 | 2008-01-02 | 2008-01-03 | 2008-01-04 | -------------------------------------------------------------- |Spot 1 | 301 | 199 | 234 | 342 | |Spot 2 | 172 | 217 | 1023 | 321 | |Spot 3 | 512 | 813 | 333 | |
Or, you can see an average or a count in the data field instead:
new_columns = data.pivot('airtime', 'advert', 'rate', :average)
|advert | 60 | 90 | 30 | 15 | ------------------------------ |Spot 1 | | | 6 | 21 | |Spot 2 | | 11 | 7 | 14 | |Spot 3 | 8 | | 27 | 22 |
So I can see that the average click rate for Spot 3 was highest on the 30 second spot, where for Spot 1 and 2 it was the 15 second one.
There are also some convenience functions you can use to inspect your Data object at any time. ‘size’ will give you the number of rows you have, and ‘columns’ will give you all the active column names.
See MungerReport for how to turn this munged Data object into an actual report.