Skip to content

Latest commit

 

History

History
38 lines (24 loc) · 2.15 KB

02c-end_of_simple_transform.asciidoc

File metadata and controls

38 lines (24 loc) · 2.15 KB

Exercises

Exercise 1.1: Running time

It’s important to build your intuition about what makes a program fast or slow.

Write the following scripts:

  • null.rb  — emits nothing.

  • identity.rb  — emits every line exactly as it was read in.

Let’s run the reverse.rb and piglatin.rb scripts from this chapter, and the null.rb and identity.rb scripts from exercise 1.1, against the 30 Million Wikipedia Abstracts dataset.

First, though, write down an educated guess for how much longer each script will take than the null.rb script takes (use the table below). So, if you think the reverse.rb script will be 10% slower, write '10%'; if you think it will be 10% faster, write '- 10%'.

Next, run each script three times, mixing up the order. Write down

  • the total time of each run

  • the average of those times

  • the actual percentage difference in run time between each script and the null.rb script

    script     | est % incr | run 1 | run 2 | run 3 | avg run time | actual % incr |
    null:      |            |       |       |       |              |               |
    identity:  |            |       |       |       |              |               |
    reverse:   |            |       |       |       |              |               |
    pig_latin: |            |       |       |       |              |               |

Most people are surprised by the result.

Exercise 1.2: A Petabyte-scalable wc command

Create a script, wc.rb, that emit the length of each line, the count of bytes it occupies, and the number of words it contains.

Notes:

  • The String methods chomp, length, bytesize, split are useful here.

  • Do not include the end-of-line characters (\n or \r) in your count.

  • As a reminder — for English text the byte count and length are typically similar, but the funny characters in a string like "Iñtërnâtiônàlizætiøn" require more than one byte each. The character count says how many distinct 'letters' the string contains, regardless of how it’s stored in the computer. The byte count describes how much space a string occupies, and depends on arcane details of how strings are stored.