-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more information about chinese character sets & combining marks #627
Conversation
✅ Deploy Preview for clreq ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@xfq i'd appreciate a quick turnaround on this review, if possible, so that i can prep the doc for publication. Thanks. |
resources/index.html
Outdated
@@ -174,11 +174,11 @@ <h2>Chinese Script Overview</h2> | |||
|
|||
<p>Words are not separated by spaces or any other character. There is no case distinction. The visual forms of characters don't interact.</p> | |||
|
|||
<p>In its 'main' category, CLDR lists 2,210 characters for the Simplified Chinese orthography, and 2,180 for Traditional Chinese. Combined, this includes 3,026 unique characters, and an overlap of 1,064 characters. A working set of characters for modern Chinese may include 3 times this number, and the Unicode Standard includes approaching 100,000 Han characters, many of which are archaic or esoteric.</p> | |||
<p>In its 'main' category, CLDR lists 2,210 characters for the Simplified Chinese orthography, and 2,180 for Traditional Chinese. Combined, this includes 3,026 unique characters, and an overlap of 1,064 characters. A working set of characters for modern Chinese may include 3 times this number, and number of characters in the Unicode Standard approaches 100,000 Han code points, many of which are archaic or esoteric. In fact, various regions define their own character sets, such as the 3,500 characters in the Tier I Table of <span lang="zh">通用规范汉字表</span> (General Standard Chinese Characters) in Mainland China, the 4,808 characters in the Taiwanese <span lang="zh">常用“国字”标准字体表</span> (Chart of Standard Forms of Common National Characters), the 4,759 characters in <span lang="zh">常用字字形表</span> (Common Chinese Characters) in Hong Kong SAR, or the sets of <span lang="zh">欢乐伙伴</span> ("Happy Buddy") characters for Singaporean primary schools.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task force prefers to remove the CLDR numbers unless they have reliable sources.
See discussions in https://www.w3.org/2024/05/08-clreq-minutes.html#t01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I commented that text out because i wanted the information to still be available to myself for future discussions re. CLDR. The character sets now mentioned are all fairly small, relatively speaking, so i added the figure of 10,000 to give the impression of the size of the repertoire needed for things such as text editors (I know that Mainland China basic repertoire is substantially less than the Taiwanese, and this is only an indicator.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a first version, I think it is OK, but we need to continue discussing some details with the task force (after the publication).
Addresses comments in #619