Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FR formatting #68

Merged
merged 5 commits into from
Oct 14, 2024
Merged

Fix FR formatting #68

merged 5 commits into from
Oct 14, 2024

Conversation

skyman503
Copy link
Collaborator

@skyman503 skyman503 commented Oct 10, 2024

This CL changes the formatting expression for address_overflow, making it go second after the building location

@skyman503 skyman503 requested a review from norgevz October 10, 2024 15:49
model/countries/FR/FR-parsing-rules.yaml Outdated Show resolved Hide resolved
model/countries/FR/FR-parsing-rules.yaml Outdated Show resolved Hide resolved
@@ -16,6 +16,12 @@ regex_definitions:
# Regular expression to match the prefixes that indicate a house number or name.
kHouseNumberOrNameOptionalPrefixRe:
regex_fragment: '(?:(?:no|nr|°|º|numéro)[-.\s]*)?'

kCommaOrNewlineSeparator:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this similar to kCommaOrWhitespaceSeparator? I'm concerned about introducing multiple different separators on multiple country files. Ideally, we use shared ones as most as possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

French street names can have spaces in them, the kCommaOrWhitespaceSeparator ends up matching the first space and thus leading to the wrong parsing of the Place Charles de Gaulle

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then probably you want something like (?:, |\n|\r|,)+?

- capture:
output: address-overflow
parts: [ {regex_fragment: '(?:[^,\r\n]+)'} ]
prefix: {regex_reference: kSpaceOptionalPrefixRe}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just inlining {regex_fragment: '\s*' }?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sg.

@skyman503 skyman503 requested a review from norgevz October 11, 2024 15:33
Copy link
Collaborator

@norgevz norgevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % a couple of updates.

@@ -16,6 +16,12 @@ regex_definitions:
# Regular expression to match the prefixes that indicate a house number or name.
kHouseNumberOrNameOptionalPrefixRe:
regex_fragment: '(?:(?:no|nr|°|º|numéro)[-.\s]*)?'

kCommaOrNewlineSeparator:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then probably you want something like (?:, |\n|\r|,)+?

output:
street-address-alternative-1: "Appt 1, 58 rue du Gue Jacquet"
street-address-alternative-1: "58 rue du Gue Jacquet, Appt 1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you drop the space after the comma? I would like to validate that case as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@skyman503 skyman503 merged commit 29766e0 into main Oct 14, 2024
2 checks passed
@skyman503 skyman503 deleted the fr-2 branch October 14, 2024 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants