Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcpc_query: Parse two (sub)activities correctly #349

Merged
merged 4 commits into from
Feb 15, 2022
Merged

Conversation

jranke
Copy link
Contributor

@jranke jranke commented Jan 31, 2022

Closes #348

PR task list:

  • Update NEWS
  • Add tests (if appropriate)
  • Update documentation with devtools::document()
  • Check package passed

@Aariq
Copy link
Collaborator

Aariq commented Jan 31, 2022

Can you give an example of what the new output looks like? (i.e. instead of the results shown in #348)

@jranke
Copy link
Contributor Author

jranke commented Jan 31, 2022

Sure:

webchem::bcpc_query("2,4-D")[[1]][c("activity", "subactivity")]
#> $activity
#> [1] "herbicides"              "plant growth regulators"
#> 
#> $subactivity
#> [1] "phenoxyacetic herbicides" "auxins"

Created on 2022-01-31 by the reprex package (v2.0.1)

@Aariq Aariq requested a review from stitam January 31, 2022 15:45
@Aariq
Copy link
Collaborator

Aariq commented Jan 31, 2022

Is there always one sub-activity for every activity? If so, then this seems potentially reasonable. If some activities have multiple sub-activities, then it will become difficult to determine which sub-activity is nested under which activity with this format. Don't get me wrong, this is an improvement over the previous behavior (which was obviously a bug)! I'm just wondering if maybe the output needs a re-work as a data frame or further nested list. I'll let @stitam or others decide whether to accept this PR as-is or try to figure out an even better fix before merging.

@jranke jranke marked this pull request as draft January 31, 2022 16:31
This commit makes bcpc_query support more than two activities and empty subactivities
@jranke jranke marked this pull request as ready for review January 31, 2022 16:35
This gave errors on my local system when testing with `devtools::test()`
@Aariq
Copy link
Collaborator

Aariq commented Feb 15, 2022

Related? #295

@stitam
Copy link
Contributor

stitam commented Feb 15, 2022

Thanks @jranke for opening this PR and apologies for my delay in responding to it. I agree with Eric that this is an improvement over the previous version, but I think we should think a bit more about some special cases. For example:

webchem::bcpc_query("thiodicarb")[[1]][c("activity", "subactivity")]
#> $activity
#> [1] "insecticidesmolluscicides"
#> 
#> $subactivity
#> [1] "oxime carbamate insecticidesmolluscicides"

Created on 2022-02-15 by the reprex package (v2.0.1)

This one is different from the website output which is 1. insecticides (oxime carbamate insecticides), 2. molluscicides

Here is another one:

webchem::bcpc_query("methiocarb")[[1]][c("activity", "subactivity")]
#> $activity
#> [1] "acaricides (carbamate acaricides)bird repellentsinsecticidesmolluscicides"
#> 
#> $subactivity
#> [1] "phenyl methylcarbamate insecticidesmolluscicides"

Created on 2022-02-15 by the reprex package (v2.0.1)

Regarding @Aariq's concern, I think returning a data frame with columns "activity" and "subactivity" would be great, it would remove any ambiguity around interpreting the output. I personally don't like nested lists but that's also a solution. Yet another alternative, activity and subactivity could contain vectors of equal length. This is not explicit but maybe enough for now to merge the PR and then we can think elsewhere about output structure for the function in general. What do you think?

Copy link
Contributor

@stitam stitam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jranke for opening this PR! I've added my comments within the conversation.

@jranke
Copy link
Contributor Author

jranke commented Feb 15, 2022

... but I think we should think a bit more about some special cases. For example:

webchem::bcpc_query("thiodicarb")[[1]][c("activity", "subactivity")]
#> $activity
#> [1] "insecticidesmolluscicides"
#> 
#> $subactivity
#> [1] "oxime carbamate insecticidesmolluscicides"

Created on 2022-02-15 by the reprex package (v2.0.1)

With the changes in this PR I get

webchem::bcpc_query("thiodicarb")[[1]][c("activity", "subactivity")]
#> $activity
#> [1] "insecticides"  "molluscicides"
#> 
#> $subactivity
#> [1] "oxime carbamate insecticides" NA

Created on 2022-02-15 by the reprex package (v2.0.1)

This one is different from the website output which is 1. insecticides (oxime carbamate insecticides), 2. molluscicides

As you can see my PR returns NA for the empty subactivity.

Here is another one:

webchem::bcpc_query("methiocarb")[[1]][c("activity", "subactivity")]
#> $activity
#> [1] "acaricides (carbamate acaricides)bird repellentsinsecticidesmolluscicides"
#> 
#> $subactivity
#> [1] "phenyl methylcarbamate insecticidesmolluscicides"

Created on 2022-02-15 by the reprex package (v2.0.1)

Here I get

webchem::bcpc_query("methiocarb")[[1]][c("activity", "subactivity")]
#> $activity
#> [1] "acaricides"      "bird repellents" "insecticides"    "molluscicides"  
#> 
#> $subactivity
#> [1] "carbamate acaricides"                NA                                   
#> [3] "phenyl methylcarbamate insecticides" NA

Created on 2022-02-15 by the reprex package (v2.0.1)

Regarding @Aariq's concern, I think returning a data frame with columns "activity" and "subactivity" would be great, it would remove any ambiguity around interpreting the output. I personally don't like nested lists but that's also a solution. Yet another alternative, activity and subactivity could contain vectors of equal length.

This is what this PR does.

This is not explicit but maybe enough for now to merge the PR and then we can think elsewhere about output structure for the function in general. What do you think?

Yes, I would propose to take it as it is for the moment.

@Aariq
Copy link
Collaborator

Aariq commented Feb 15, 2022

Ok, let's merge for now, then take a look for edge cases where activities might have more than one sub-activity as a separate issue (which may be fixed by changing output to a tibble)

@Aariq Aariq merged commit f2fe526 into ropensci:master Feb 15, 2022
@jranke
Copy link
Contributor Author

jranke commented Feb 15, 2022

OK, thanks for reviewing!

@stitam
Copy link
Contributor

stitam commented Feb 16, 2022

For some reason the function didn't update properly on my end when I pulled this branch for reviewing. After merging I tested it again and I could reproduce your output. Thanks @jranke for the clarification.

@jranke
Copy link
Contributor Author

jranke commented Feb 16, 2022

For some reason the function didn't update properly on my end when I pulled this branch for reviewing. After merging I tested it again and I could reproduce your output. Thanks @jranke for the clarification.

No problem - maybe you pulled the first version before I updated the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In bcpc_query, the case of more than one activity is not properly handled
3 participants