////Ease the reader in with something like, "Our goal here will be…" Amy////
Let’s start exploring the dataset. Andy Baio
link:code/serverlogs/old/logline-02-histograms-mapper.rb[role=include]
We want to group on date_hr
, so just add a 'virtual accessor' — a method that behaves like an attribute but derives its value from another field:
link:code/serverlogs/old/logline-00-model-date_hr.rb[role=include]
This is the advantage of having a model and not just a passive sack of data.
Run it in map mode:
link:code/serverlogs/old/logline-02-histograms-02-mapper-wu-lign-sort.log[role=include]
TODO: digression about wu-lign
.
Sort and save the map output; then write and debug your reducer.
link:code/serverlogs/old/logline-02-histograms-full.rb[role=include]
When things are working, this is what you’ll see. Notice that the …/Star_Wars_Kid.wmv
file already have five times the pageviews as the site root (/
).
link:code/serverlogs/old/logline-02-histograms-03-reduce.log[role=include]
You’re ready to run the script in the cloud! Fire it off and you’ll see dozens of workers start processing the data.
link:code/serverlogs/old/logline-02-histograms-04-freals.log[role=include]