Forum

Notifications
Clear all

"Upper-128" Punctuation Not Displaying Correct

0 Posts
2 Users
0 Reactions
263 Views
(@bartlett22183)
Estimable Member
Joined: 21 years ago
Posts: 92
Topic starter  

I'm not sure whether this is a configuration issue, a bug, or something else. It has definitely showed up with Absolute Telnet v3.85. I don't remember whether it was present with 3.80.

I have A.T. 3.85 installed on a Windows 98 machine. I use MS Dial-Up Networking to establish a PPP connection with my ISP, which provides Unix (netBSD) shell accounts. My shell is tcsh. I use Pine v4.63 to read mail and news. The A.T. configuration (I use a .tnt file) is xterm emulation. On my shell account I have the $TERM environment variable set to "xterm-color". (Incidentally, I tried setting both values to vt100 and got the same result.)

A number of the email messages in mailing lists I follow have shown an anomaly. The character set encoding I have A.T. set to is usually ISO-8859-1, because not all the messages are in English. However it happens, some emails are using ISO Latin-1 punctuation marks in the "upper register," i.e., with the high bit set.

In particular, some of the punctuation marks are closing single quote (0x92) in place of apostrophe, opening double quotes (0x93), and closing double quotes (0x94). When I am reading messages in Pine, these are displaying as ~R, ~S, and ~T respectively, i.e., they are not displaying correctly. I have verified the hex values by downloading (zmodem) to my own computer and looking at them in a hex editor. In addition, when I look at the downloaded messages in an ISO Latin-1 aware editor, the characters display normally. There have been some other instances when other high-bit-set punctuation marks are showing up as ~something, although I haven't tried (yet) to track down just what the specific punctuation marks are. I should mention that "letters" with diacritical marks in the upper register (as in French) are displaying normally. It is only the punctuation marks that are not displaying correctly.

Back on the shell account on the server, I pulled a message into the joe editor, which normally can handle high-bit-set characters. For the punctuation marks affected, I got a little square instead of the correct mark.

Bug? Misconfiguration? Something else? Please ask if more data is needed.

[size=1][ August 24, 2005, 12:26 AM: Message edited by: Brian T. Pence ][/size]


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

What is the value of the LANG environment variable?

Can you forward me one of these emails?

(bpence at celestialsoftware.net)

Brian


   
ReplyQuote
(@bartlett22183)
Estimable Member
Joined: 21 years ago
Posts: 92
Topic starter  

Are you referring to an environment variable in the Unix shell? I am not familiar with it. It is presently undefined. As for an email, they do not show up all the time, and I may have to edit one for privacy when I get one. "character-set" in the Pine configuration is set to "ISO-8859-1".


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

Ok,

I got the sample email you sent me and I've got your answer. The short answer is this: the character set in your email is NOT ISO-8859-1, even if it claims to be.

The long answer (and solution):

ISO-8859-1 reserves the range 0x80-0x9f for control characters. This range is called the C1 control set and contains NO printable characters. Here is an offical code chart from [url= http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT ]unicode.org[/url]

So, you see your characters (0x92, 0x93, 0x94) are not valid characters in the ISO-8859-1 set.

Apparently, though, Microsoft thought it was silly to leave those 32 characters positions for control characters that are rarely used. So, they grabbed a few for some special purpose characters (smart quotes and some others). Code chart [url= http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT ]here[/url] They renamed the character set Win1252. It can be thought of as a superset of ISO-8859-1, but it is not EQUIVALENT to ISO-8859-1 because it puts printable characters in the C1 range.

It is error-prone to use Win1252 to send mail to someone using a terminal to read the mail because the smart quotes in the C1 control code range look like... well... control codes to pine and are converted to something it considers printable (~R ~S ~T). It is even more error-prone to send WIN1252 text and *claim* it to be ISO-8859-1. Most text displays properly, with exceptions you noted. Magically, though, the emails will look fine in most Microsoft products because they'll automatically use Win1252 when 8859-1 is specified. Imagine that!

The French accents you mentioned are in the legal range for both 8859-1 and WIN1252, so there is not problem there.

What we need is for pine to pass these characters through. To get pine to pass these through as-is, you need to go into pine config and in "Viewer Preferences", you need to enable this option:

pass-c1-control-characters-as-is

Depending on your pine version, this option may *not* be available. If it isn't, enable this one instead:

pass-control-characters-as-is

Now, back in AbsoluteTelnet, you must change the character set translation to Win1252 so the proper mapping is done. (Options->Properties->Appearance->Translation)

Things should be looking better now in pine.

Whew.....

You don't get this kind of support everywhere!!!

Brian


   
ReplyQuote
(@bartlett22183)
Estimable Member
Joined: 21 years ago
Posts: 92
Topic starter  

Brian, thank you very much!!! I think that Absolute Telnet is an excellent product (I would recommend it to others), and I think you give excellent support.

Leave it to Microsoft to thumb their nose at (or give the finger to) the rest of the world. They take the attitude that IBM did many years ago. "Standards? What do you mean, standards? We are IBM." MS seem to think that they do not have to adhere to standards.

I have gone into my Pine configuration (v4.63) and made the change that you recommended. I will also try the translation option for A.T. My Pine is set for a default character set of ISO-8859-1, as that is what I most commonly need for non-English posts. (Yes, there are people in this world who do not use English all the time.) The sample I sent you claimed to be ISO-8859-1, so that Pine did not make any fuss, but as you helpfully pointed out, it was not.

Thanks again, Brian.


   
ReplyQuote
Share: