Example 4-12 - PerKeyAvg for Python Incorrect #24

funseiki · 2016-08-01T21:24:42Z

In the example, the map method shows to take a lambda with two parameters (key and xy), but it appears as though the python version of spark only has a map method that expects a lambda with just a single parameter.

So instead of the following

r = sumCount.map(lambda key, xy: (key, xy[0]/xy[1])).collectAsMap()

We should use

 r = sumCount.map( lambda kvp: ( kvp[0], kvp[1][0] / kvp[1][1] ) ).collectAsMap()

The text was updated successfully, but these errors were encountered:

funseiki · 2016-08-01T21:27:41Z

Note: It appears the github source foregoes the mapping step and returns a list of {key: (sum, count)} instead of {key: avg}.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example 4-12 - PerKeyAvg for Python Incorrect #24

Example 4-12 - PerKeyAvg for Python Incorrect #24

funseiki commented Aug 1, 2016

funseiki commented Aug 1, 2016

Example 4-12 - PerKeyAvg for Python Incorrect #24

Example 4-12 - PerKeyAvg for Python Incorrect #24

Comments

funseiki commented Aug 1, 2016

funseiki commented Aug 1, 2016