Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example 4-12 - PerKeyAvg for Python Incorrect #24

Open
funseiki opened this issue Aug 1, 2016 · 1 comment
Open

Example 4-12 - PerKeyAvg for Python Incorrect #24

funseiki opened this issue Aug 1, 2016 · 1 comment

Comments

@funseiki
Copy link

funseiki commented Aug 1, 2016

In the example, the map method shows to take a lambda with two parameters (key and xy), but it appears as though the python version of spark only has a map method that expects a lambda with just a single parameter.

So instead of the following

r = sumCount.map(lambda key, xy: (key, xy[0]/xy[1])).collectAsMap()

We should use

 r = sumCount.map( lambda kvp: ( kvp[0], kvp[1][0] / kvp[1][1] ) ).collectAsMap()
@funseiki
Copy link
Author

funseiki commented Aug 1, 2016

Note: It appears the github source foregoes the mapping step and returns a list of {key: (sum, count)} instead of {key: avg}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant