Exercises

Exercise 1.1: Running time

It’s important to build your intuition about what makes a program fast or slow.

Write the following scripts:

null.rb — emits nothing.
identity.rb — emits every line exactly as it was read in.

Let’s run the reverse.rb and piglatin.rb scripts from this chapter, and the null.rb and identity.rb scripts from exercise 1.1, against the 30 Million Wikipedia Abstracts dataset.

First, though, write down an educated guess for how much longer each script will take than the null.rb script takes (use the table below). So, if you think the reverse.rb script will be 10% slower, write '10%'; if you think it will be 10% faster, write '- 10%'.

Next, run each script three times, mixing up the order. Write down

the total time of each run
the average of those times

the actual percentage difference in run time between each script and the null.rb script

script     | est % incr | run 1 | run 2 | run 3 | avg run time | actual % incr |
null:      |            |       |       |       |              |               |
identity:  |            |       |       |       |              |               |
reverse:   |            |       |       |       |              |               |
pig_latin: |            |       |       |       |              |               |

Most people are surprised by the result.

Exercise 1.2: A Petabyte-scalable `wc` command

Create a script, wc.rb, that emit the length of each line, the count of bytes it occupies, and the number of words it contains.

Notes:

The String methods chomp, length, bytesize, split are useful here.
Do not include the end-of-line characters (\n or \r) in your count.
As a reminder — for English text the byte count and length are typically similar, but the funny characters in a string like "Iñtërnâtiônàlizætiøn" require more than one byte each. The character count says how many distinct 'letters' the string contains, regardless of how it’s stored in the computer. The byte count describes how much space a string occupies, and depends on arcane details of how strings are stored.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02c-end_of_simple_transform.asciidoc

02c-end_of_simple_transform.asciidoc

Exercises

Exercise 1.1: Running time

Exercise 1.2: A Petabyte-scalable `wc` command

Files

02c-end_of_simple_transform.asciidoc

Latest commit

History

02c-end_of_simple_transform.asciidoc

File metadata and controls

Exercises

Exercise 1.1: Running time

Exercise 1.2: A Petabyte-scalable wc command

Exercise 1.2: A Petabyte-scalable `wc` command