Forum

Win 7 HKSCS Surroga...
 
Notifications
Clear all

Win 7 HKSCS Surrogate pair characters to Big5

0 Posts
2 Users
0 Reactions
320 Views
(@rdbrown_au_4_hk)
New Member
Joined: 11 years ago
Posts: 1
Topic starter  

A client may be interested in ~100 user license contingent on this being fixed.

Prior to Windows Vista ~1600 characters of the [url= http://en.wikipedia.org/wiki/HKSCS#Microsoft_Windows ]Hong Kong Supplementary Character Set[/url] were mapped into the Private Use Area of the Unicode Basic Multilingual Plane. I believe this means they can be represented as a single UTF-16 character. Vista on supports these characters as Unicode 4.1 (ISO/IEC 10646:2003) characters outside the BMP, needing a [url= http://msdn.microsoft.com/en-us/library/dd374069%28VS.85%29.aspx ]surrogate pair[/url] of UTF-16 characters to represent them.

What this means is that using an IME or pasting these characters from a Unicode application like Excel doesn't work in the emulators we've tested (Absolute Telnet included), possible because the emulators assume they can convert a single UTF-16 character to the DBCS ANSI string maybe with [url= http://msdn.microsoft.com/en-us/library/dd374130%28v=vs.85%29.aspx ]WideCharToMultiByte[/url].

From a quick look at [url= http://www.unicode.org/versions/Unicode6.2.0/ch02.pdf ]Ch. 2 p. 40[/url] of the current Unicode standard "Plane 2, the Supplementary Ideographic Plane (SIP), consists primarily of one big area, starting from the first code point in the plane, that is dedicated to encoding additional unified CJK characters.", CJK characters and maybe even HKSCS => Big5 may be the only surrogate pair characters that may need a conversion to DBCS to trigger this.

Some example character values, Big 5 is the two byte value as hex, the Unicode value as Hex

Big5 Unicode HKSCS standard version
8745 27267 HKSCS-2004
8748 27CB1 HKSCS-2004
874A 27CC5 HKSCS-2004
...

8845 2010C HKSCS-1999
8847 200D1 HKSCS-1999
8848 200CD HKSCS-1999
884B 200CB HKSCS-1999
884C 21FE8 HKSCS-1999
884E 200CA HKSCS-1999
8853 2010E HKSCS-1999
...

8940 2A3A9 HKSCS-1999
8941 21145 HKSCS-1999
894C 27735 HKSCS-1999
89B2 209E7 HKSCS-1999
89BB 29DF6 HKSCS-1999
89BC 2700E HKSCS-1999
...

FEED 28B2B HKSCS-1999
FEEE 26083 HKSCS-1999
FEEF 2261C HKSCS-1999
FEF4 25857 HKSCS-1999
FEF6 27B39 HKSCS-1999
FEFA 27126 HKSCS-1999
FEFD 2910D HKSCS-1999

I can provide the full list if needed.

For someone like me who cannot read Han characters, comparing the glyph in the terminal window
to the value from the [url= http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=28B2B ]Unihan Data (ie Big5 FEED 28B2B[/ul] helps.


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

If you're still looking for a solution, I should be able to help.

Can you post a screen snapshot of what it looks like when you try to paste the character? This is an important first step. If the character shows up as a square box, it may be a font mapping issue. If it shows up as a question mark, it can be a character set translation issue.

Also send a snapshot of options->properties->Appearance so I can see what your settings are.

Also, could you give me some code points for characters that *do* work so I can trace through both working and non-working characters to see where the problem is?

Absolute recognized characters that require surrogate pairs and should work with characters inside and outside the BMP.

Thanks,

Brian


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

I've been working with Raymond and Rodney on a solution for supporting HKSCS in the SSH terminal using BIG5 encoding, which is not well supported at all by default in Windows Vista and above. Everyone seems to be abandoning BIG5 in favor of Unicode and UTF8, but some of us still have to support legacy applications.

Solution is forthcoming.

Brian Pence
Celestial Software


   
ReplyQuote
Share: