-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IQSS/11242 ExternalIdentifier fix #11246
base: develop
Are you sure you want to change the base?
IQSS/11242 ExternalIdentifier fix #11246
Conversation
@@ -4,7 +4,7 @@ | |||
import java.util.regex.Pattern; | |||
|
|||
public enum ExternalIdentifier { | |||
ORCID("ORCID", "https://orcid.org/%s", "^\\d{4}-\\d{4}-\\d{4}-(\\d{4}|\\d{3}X)$"), | |||
ORCID("ORCID", "https://orcid.org/%s", "^(https:\\/\\/orcid.org\\/)?\\d{4}-\\d{4}-\\d{4}-(\\d{4}|\\d{3}X)$"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of b9d0146 we say the following:
When entering author identifiers, select the type from the dropdown (e.g. "ORCID") and under "Identifier" enter just the unique identifier (e.g. "0000-0002-1825-0097") rather than the full URL (e.g. "https://orcid.org/0000-0002-1825-0097").
Should this be re-written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code will support both forms, but if you aren't using the external vocab scripts, the short form may still be preferable? That's why I didn't change it, but happy to change it if we want to allow a choice for manual entry. (AFAIK, there is no validation for input so people can still enter whatever they want.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey all. @pdurbin pinged me about this.
The short form, or the non-URL form, is preferable for ORCIDs that users type in or paste in the Author Identifier field, because that's what makes the link clickable, right? So the same will hold true for RORs that users type in or paste in the Author Identifier field, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The earlier fix changed ROR so the short form would be a link. This PR makes it so that either form of ORCID or ROR should result in a valid link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay. Also, when you wrote "happy to change it if we want to allow a choice for manual entry", I thought we can already allow a choice for manual entry of identifiers. That's how it's set up on Demo Dataverse and planned for Harvard Dataverse.
Or did you mean when installations configure an external vocab script to allow for manual entry, that is that users can enter the identifier in the Identifier field that appears when they enter their own name instead of selecting one from the suggestions?
About the user guide text being added in b9d0146, it's written to apply to all identifiers, so I'm testing on Demo Dataverse to review how the other identifiers that users enter in the Author Identifier field are displayed, especially when the identifier has a URL form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that when you write that "the other identifier types do not currently recognize the URL forms", you mean for cases where the cvoc script is applied to the field. Is that right? Or am I misinterpreting what you mean by "recognize". I've only been thinking of how the IDs are displayed as links when users enter IDs in the Author Identifier field, whether or not the cvoc script is applied to the field.
On Harvard Dataverse, where the cvoc script isn't being used yet, almost all identifiers that have a URL form are displayed as links on the dataset page when the non-URL form is entered in the Author Identifier field (and maybe as long as the correct Author Identifier Type is chosen; I'm just realizing that I didn't test this). And the IDs are not displayed as links when the URL form is entered instead. The exception is the ISNI identifier type; it's never displayed as a link.
To be honest, I'd consider it a regression that when the cvoc script is applied and identifiers that aren't an ORCID or ROR are entered in the Author Identifier fields, they won't be displayed as links regardless of how users enter the IDs (URL form or short form).
And I agree it would be an expansion of scope if it's addressed in this PR. I'll open an issue in the external vocab repo.
@pdurbin, the user guide text you pointed out in b9d0146 doesn't mention how the IDs may or may not be displayed as links, depending on whether or not the cvoc script is used. I think I saw that text when you suggested it and I was okay with it because I didn't think the behavior would change when the cvoc script was used. For now, I'd recommend not changing the text because:
- Most users will still be using a Dataverse installation where the cvoc script hasn't been applied to their Author fields
- For a while now we've been recommending that users enter IDs this way (using the short form), so it's behavior that frequent users are already familiar with (and hopefully they'll be glad to not have to do it or to have to do it less often because of the cvoc script!)
We've known that when users, especially folks who aren't curators, have to copy and paste or type in certain types of IDs, it's easier for them to use the URL form, and in some cases, like ORCIDs and DOIs, it's a common recommendation to display the URL forms of IDs in user-facing pages instead of displaying things like the IDs' short forms. So I'd assume that it would be easier for users if the URL form was always displayed as a text link or as a clickable logo like the cvoc scripts do now for ORCIDs and RORs. And this might even do away with the need to write about this in the user guides, which I've assumed we've done because expecting that many users, especially folks who aren't curators, enter the short forms of most of these IDs isn't as intuitive as expecting that they'll enter the URL forms, which they do often anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the code so that all identifiers will handle both URL and short form (when no scripts are running).
I expanded the tests so that all identifier types are actually checked and verified that both the URL and short forms get format()ed to the URL form (which is specifically relevant to this issue)
I also fixed the ISNI matcher which didn't support ISNIs ending in X instead of a digit.
I've left the text alone.
I'll look for the issue in the ext-vocab repo - guessing we'll need to have similar regex detection to decide when to turn the values into links.
With that, I think this is ready to continue forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! So "The exception is the ISNI identifier type; it's never displayed as a link" isn't true. It is displayed as a link for ISNIs that end in digits, but not for ISNIs that end in X. Thanks for looking into that and for fixing it!
I'm not sure what you mean by "all identifiers will handle both URL and short form". Does that mean that for an identifier that has a URL form and a short form, when a user enters either form into the Author Identifier field, it's displayed as a link on the dataset page and it's included in DataCite exports and sent to DataCite (for repositories that register DOIs)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created that GitHub issue in the ext-vocab repo. It's at gdcc/dataverse-external-vocab-support#33. There are some details I couldn't be sure of because:
- I'm testing on Demo Dataverse and the change won't happen on Demo Dataverse's Author fields until v6.6 is released and applied to Demo Dataverse, like how RORs are displayed on the dataset page
- the Author fields on Demo Dataverse are behaving in ways I didn't expect, like how the dataset page displays an ORCID logo even when the user enters the ORCID by using the Author Identifier Type and Author Identifier fields
- I'm not sure about what changes were made so that the dataset page displays IDs as links when users enter IDs in Author fields where the cvoc script isn't applied. This is what I'm trying to clarify in my last comment.
I can help update that GitHub issue as I understand more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the identifier types have both a short and URI form (FWIW DAI was not handled before - I've added it). Entering either as the author identifier will cause it to display as a link in the metadata display (except DAI which is not a URL). The DataCite XML generator adds the identifier, in URI form, regardless of the identifier type, so yes they are all included regardless of the short or URI input form (including DAI).
What this PR does / why we need it: This PR fixes a bug in the ExternalIdentifier class that caused ORCIDs in URL form to not be recognized.
Which issue(s) this PR closes:
Special notes for your reviewer: As the fix is generic, I reverted changes in the ROR recognition to avoid having two separate versions. Also added tests for formatting or ORCID and ROR.
Also - I noted that it looks like some tests in DatasetFieldValueValidatorTest duplicate those in ExternalIdentifier. If a reviewer confirms that, I'd be happy to remove them (and the isValidAuthorIdentifier() method that is only used in those tests).
Suggestions on how to test this: Verify that publishing a dataset with an ORCID in numeric form or URL form as an author ID, the ORCID appears in the DataCite XML. Verify that either form of ORCID/ROR show up as valid links in the page display (without external vocab scripts turned on).
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?: included.
Additional documentation: