Skip to content

Latest commit

 

History

History
42 lines (28 loc) · 1.21 KB

07b-pageview_histograms.asciidoc

File metadata and controls

42 lines (28 loc) · 1.21 KB

Pageview Histograms

////Ease the reader in with something like, "Our goal here will be…​" Amy////

Let’s start exploring the dataset. Andy Baio

link:code/serverlogs/old/logline-02-histograms-mapper.rb[role=include]

We want to group on date_hr, so just add a 'virtual accessor' — a method that behaves like an attribute but derives its value from another field:

link:code/serverlogs/old/logline-00-model-date_hr.rb[role=include]

This is the advantage of having a model and not just a passive sack of data.

Run it in map mode:

link:code/serverlogs/old/logline-02-histograms-02-mapper-wu-lign-sort.log[role=include]

TODO: digression about wu-lign.

Sort and save the map output; then write and debug your reducer.

link:code/serverlogs/old/logline-02-histograms-full.rb[role=include]

When things are working, this is what you’ll see. Notice that the …​/Star_Wars_Kid.wmv file already have five times the pageviews as the site root (/).

link:code/serverlogs/old/logline-02-histograms-03-reduce.log[role=include]

You’re ready to run the script in the cloud! Fire it off and you’ll see dozens of workers start processing the data.

link:code/serverlogs/old/logline-02-histograms-04-freals.log[role=include]