Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more information about chinese character sets & combining marks #627

Merged
merged 2 commits into from
Jul 12, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions resources/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -174,11 +174,11 @@ <h2>Chinese Script Overview</h2>

<p>Words are not separated by spaces or any other character. There is no case distinction. The visual forms of characters don't interact.</p>

<p>In its 'main' category, CLDR lists 2,210 characters for the Simplified Chinese orthography, and 2,180 for Traditional Chinese. Combined, this includes 3,026 unique characters, and an overlap of 1,064 characters. A working set of characters for modern Chinese may include 3 times this number, and the Unicode Standard includes approaching 100,000 Han characters, many of which are archaic or esoteric.</p>
<p>In its 'main' category, CLDR lists 2,210 characters for the Simplified Chinese orthography, and 2,180 for Traditional Chinese. Combined, this includes 3,026 unique characters, and an overlap of 1,064 characters. A working set of characters for modern Chinese may include 3 times this number, and number of characters in the Unicode Standard approaches 100,000 Han code points, many of which are archaic or esoteric. In fact, various regions define their own character sets, such as the 3,500 characters in the Tier I Table of <span lang="zh">通用规范汉字表</span> (General Standard Chinese Characters) in Mainland China, the 4,808 characters in the Taiwanese <span lang="zh">常用“国字”标准字体表</span> (Chart of Standard Forms of Common National Characters), the 4,759 characters in <span lang="zh">常用字字形表</span> (Common Chinese Characters) in Hong Kong SAR, or the sets of <span lang="zh">欢乐伙伴</span> (&quot;Happy Buddy&quot;) characters for Singaporean primary schools.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task force prefers to remove the CLDR numbers unless they have reliable sources.

See discussions in https://www.w3.org/2024/05/08-clreq-minutes.html#t01

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I commented that text out because i wanted the information to still be available to myself for future discussions re. CLDR. The character sets now mentioned are all fairly small, relatively speaking, so i added the figure of 10,000 to give the impression of the size of the repertoire needed for things such as text editors (I know that Mainland China basic repertoire is substantially less than the Taiwanese, and this is only an indicator.)


<p>The language is tonal, but the tones are not written explicitly.</p>

<p>Chinese has no combining marks, but has many punctuation marks. It also has a relatively complex set of typographic rules.</p>
<p>As a general rule, Chinese has no combining marks, but ideographic tone marks may be used in contexts such as university literature courses and chinese opera. On the other hand, Chinese has many punctuation marks. It also has a relatively complex set of typographic rules.</p>
xfq marked this conversation as resolved.
Show resolved Hide resolved
</section>


Expand Down