-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_wdid() searches all of wikidata, not just chemicals #238
Comments
related to #82 |
Indeed, I saw the comment about SPARQL also a while ago and started working on functions to improve the wikidata query. I am almost done and will push a PR next week. |
Wonderful! I'm concurrently working on a PR to standardize input and output of all the *"git" was a typo in the branch name. It's supposed to bet "get-consistency". |
Yes, go ahead and once your PR is merged I change the code within the function, leaving the standardized structure intact. |
PR #242 is now merged |
Great! I will file a PR this or next week as suggested above. |
Hi @andschar how's the work for this coming along? Being a Wikidata editor, I think I could help out a bit with this one, I mostly wanted to chime in to say that searching by item name with "standard" SPARQL is not particularly efficient and would probably time out a lot, see this for reference. That being said, there is a workaround which uses a mashup of SPARQL and the MediaWiki API, for example: SELECT ?item ?itemLabel WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "www.wikidata.org";
wikibase:api "EntitySearch";
mwapi:search "pyridine";
mwapi:language "en".
?item wikibase:apiOutputItem mwapi:item.
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
}
?item wdt:P31 wd:Q11173 # Guarantees items are 'instances of' a chemical compound
} The query above would search all item names and aliases for the string "pyridine", while also excluding results that are not "instances of" (P31) "chemical compound" (Q11173), which could help out with unwanted results. |
Currently
get_wdid()
searches more than just chemicals:This might be a problem for something that is both a chemical and something else, especially with acronyms like DDT which returns wdids for "Duffy's Tavern Airport" and "Dark Dance Treffen".
However, there is a note in the code that suggests it may be possible to narrow the search:
SPARQL is used in
wd_ident()
and that's all I know about it!The text was updated successfully, but these errors were encountered: