-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[de]German wiktionary's conjugation page issues #974
Comments
Maybe the Deutsch Verb schwach trennbar reflexiv template is not handled properly in flexion.py, I could take a look tomorrow. |
"untrennbar" and "unregelmäßig" tags in level 2 node are added in #980. "Zustandsreflexiv" and "reflexiv" are also in the table header so I didn't add them. |
Thank you, xxyzz. |
It seems like the section name still not captured, I just checked abachen with data "dewiktionary dump dated 2025-02-21 using wiktextract (9e2b7d3 and f2e72e5)".
The tags should include something indicating its in section(subsection) "Zustandsreflexiv". |
I checked more words. |
Section titles are added in #1047, some are not translated and added to the "raw_tags" list. |
Thank you, xxyzz. Thanks for your quick update. While testing it, I also find that it might be better to also include 'hilfsverb-haben', 'hilfsverb-sein', and 'trennbar' in the tags. One example is the word abbiegen, which includes sections for 'haben' and 'sein'. I was trying to make the change, but I'm sure you would have a better version of it. |
Tags are added in #1049. "Hilfsverb haben" and "Hilfsverb sein" are added to the "raw_tags" list, "trennbar" is translated to "separable" and added to "tags" list. |
Thank you! That's really helpful. |
Upon looking through the JSON data for each German word, I have found the following issues:
Some section names are not being accounted for when compiling the tags, leading to an excessive number of tags (ranging from 400 to 1400). Consequently, it is difficult to distinguish between different sections based on the tags alone. The section names that are being overlooked usually include 'Zustandsreflexiv', 'reflexiv', 'unregelmäßig', 'haben/sein', and 'untrennba', among others.
Some example words:
abachen - Zustandsreflexiv
anpampfen - reflexiv
abweichen - unregelmäßig
untrennbar (Deutsch)
When a word has multiple entries, each linking to a different section of the same conjugation page, the form entries in the JSONs' forms section are duplicated (though they are not entirely identical, with one or two forms varying in the list). It would be beneficial to eliminate this duplication. However, I suspect that addressing this issue is contingent upon first resolving issue 1 mentioned above.
I can provide additional details if that would be beneficial. Thank you for your work in maintaining this library.
The text was updated successfully, but these errors were encountered: