Forum

UTF-8 Support for N...
 
Notifications
Clear all

UTF-8 Support for Non-English Messages

0 Posts
2 Users
0 Reactions
300 Views
(@bartlett22183)
Estimable Member
Joined: 21 years ago
Posts: 92
Topic starter  

I run Absolute Telnet 3.85 under Windows 98. I establish a dial-up PPP connection to a Unix (netBSD) server and use A.T. to telnet into a shell account, shell tcsh. My emulation is set to 'xterm', and on the host the TERM environment variable is set to 'xterm-color'. For reading mail and newsgroups, I use Pine (currently 4.63).

Believe it or not, not all the world speaks English. Several of the mailing lists and newsgroups I subscribe to are conducted in languages other than English or contain non-English texts. More and more people are using Unicode, specifically UTF-8. Unfortunately, A.T. does not seem to be handling these messages well. I can set the Options/Properties/Translation to UTF-8, but a lot of non-ASCII characters either do not show up at all or only show up as gibberish.

I know that one needs a font with all the Unicode glyphs that one wants to display. However, I do not find an easy way to change the font. All I get is a choice of ten fonts, none of which seem to be Unicode fonts. I do have a Lucida Sans Unicode TTF installed on my system, although that is not one of the choices. Unfortunately the Help screens do not seem to be much help. A.T. is supposed to support UTF-8, but I am missing a lot, and more and more people are looking down their noses at us primitive Americans who just can't seem to get with the program.

Reactions? Help? Thanks.

[size=1][ December 12, 2005, 08:59 AM: Message edited by: Brian T. Pence ][/size]


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

Believe it or not, not all the world speaks English. Several of the mailing lists and newsgroups I subscribe to are conducted in languages other than English or contain non-English texts. More and more people are using Unicode, specifically UTF-8. Unfortunately, A.T. does not seem to be handling these messages well. I can set the Options/Properties/Translation to UTF-8, but a lot of non-ASCII characters either do not show up at all or only show up as gibberish.

Paul, I use Pine as well with UTF8 encodings, and it works for just about everything I do. Admittedly, I don't use it much for newsgroups, so there may be some issue there. However, there's more to it than just setting up Absolute to use UTF8. There's server side settings that must be set as well. First, what's the value of your 'LANG' variable? Also, do you have the Pine UTF8 patch? Pine on it's own doesn't support UTF8 well. The pine utf8 patch uses the iconv library (installed separately) to convert between other encodings to present you with UTF8 whenever possible.

I do not find an easy way to change the font. All I get is a choice of ten fonts, none of which seem to be Unicode fonts. I do have a Lucida Sans Unicode TTF installed on my system, although that is not one of the choices.

Absolute is only able to display text using fixed-width fonts, which is why the font list is smaller than with 'MS Word', for example. So, there may not be any font in your list that is universally applicable to all languages. That's not an Absolute problem, exactly, as much as it is a Font problem. You can easily switch between fonts using the font selection dropdown on the toolbar, though.

What languages, specifically, are you commonly viewing in pine? The new beta has an option to specify different fonts for East-Asian and Latin characters. This might help, because Absolute allows *ANY* font to be chosen for the Asian characters.

If you could point me to a newsgroup post you're having trouble viewing, I could take a look at it myself. I use Pine 4.58, but it should be similar.

I'm glad I can help you out with this issue. Let me know how it's going!

Brian


   
ReplyQuote
(@bartlett22183)
Estimable Member
Joined: 21 years ago
Posts: 92
Topic starter  

I appreciate the prompt response (although I usually go online once a day in the evening US east coast). Some of your questions:

1. I have no idea whether any UTF-8 patch has been applied to Pine, although I doubt it. The ISP supplies the software. I can ask on their in-house newsgroup. Can you provide more information about this patch so I can ask intelligently? Thanks.

2. The $LANG variable in my shell (tcsh) is undefined.

3. Most of the fonts I have installed are admittedly variable-width (although for a lot of mailing list and newsgroup postings fixed-width actually works better).

4. The posts involved come from both (email) mailing lists *and* newsgroups.

5. Some of the languages involved could themselves be represented by ISO-8859-x character sets (and some people are using ISO-8859-x), but some of the people are using Unicode (UTF-8) instead as more "modern" and up to date, apparently thinking that Unicode solves all representation problems and that anyone who cannot handle Unicode (UTF-8 in this instance) is so primitive as to be beneath notice. Also, because the groups themselves deal with language issues, it is not unknown for people to use posts with the Unicode glyphs for the International Phonetic Alphabet (IPA).

6. I do not have a specific newsgroup post to point you at at this time. As it turns out, the bulk of the posts involved come through email in mailing lists, and I get so many that I save only a subset of those that interest me, so at the moment I don't have any samples.

I appreciate the assistance.


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

Can you provide more information about this patch so I can ask intelligently? Thanks.

Send them this link:

[url= http://www.suse.de/~bk/pine/FAQ.html ]http://www.suse.de/~bk/pine/FAQ.html[/url]

5. Some of the languages involved could themselves be represented by ISO-8859-x character sets (and some people are using ISO-8859-x), but some of the people are using Unicode (UTF-8) instead

This is exactly what the UTF8 patch is supposed to help with. If you turn on UTF8 in Absolute, then set your LANG variable to something like 'en_US.utf8'. Now, with the PINE/UTF8 patch, utf8 emails will come through correctly and anything that is not UTF8 will be converted to UTF8 by iconv. Each mail/news post should contain a tag that defines the character set used to compose the email. Pine will use that tag to perform the proper conversion. If the tag is either missing or wrong, you're not likely to get a proper conversion.

If you have a particular mail or news post you're having trouble with, let me know where it is or send me a sample and I'll help you diagnose whatever issues you might have.

Brian


   
ReplyQuote
(@bartlett22183)
Estimable Member
Joined: 21 years ago
Posts: 92
Topic starter  

If I come up with a sample ng post I will let you know. If I receive a (non-private) email I will forward a copy. In actual practice, the great majority of the messages I am referring to are either US-ASCII or ISO-8859-x, where 'x' is most commonly 1 or 15 with occasional 2 or 3. The UTF-8 messages at present are a minority. I usually keep A.T. (and Pine) set to ISO-8859-1, which handles most cases. Then if Pine says that it is in another ISO charset, I can switch A.T. for just that message. What I am not clear on is if this patch to Pine (assuming the ISP would even install it, which I haven't asked about yet) would convert *everything* to UTF-8, requiring a font with the appropriate glyphs, and whether that could be a problem with messages now coming through correctly in ISO-8859-x. In other words, can I just leave ISO-8859-x alone and only deal with the UTF-8 stuff on a message by message basis?


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

8859-x, where 'x' is most commonly 1 or 15 with occasional 2 or 3. The UTF-8 messages at present are a minority. I usually keep A.T. (and Pine) set to ISO-8859-1, which handles most cases. Then if Pine says that it is in another ISO charset, I can switch A.T. for just that message.

This is what the patch does for you. If it gets a message with 8859-15, it converts it to UTF8. If it's 8859-2 or 8859-3, it converts it to UTF8. So AbsoluteTelnet stays at UTF8. The only gotcha you might encounter is the font. You must have a font that covers all of the LATIN characters set. If any characters come in the email that are not covered by the font, they will show up as little empty boxes. If this happens, you have to switch the font.

Lucky for you, most fonts on Win2K and WinXP, and even older versions of Windows cover most if not all of the 8859-X characters. If you were viewing arabic, hebrew, japanese, or chinese, I might worry about the font. However, if you can get the pine utf8 stuff worked out, I don't think you'll have to worry much about the font.

Let me know if you get one of these mails.

Brian


   
ReplyQuote
(@bartlett22183)
Estimable Member
Joined: 21 years ago
Posts: 92
Topic starter  

If I get a post or email of the sort I am inquiring about I will forward it to you. However, based on what I am understanding so far, I am skeptical that my ISP would install a patch to Pine, as that would affect *every* user of Pine on their system (and it's a pretty fair sized ISP), not just me. I suspect that what these UTF-8 posters are assuming is that all mail processing takes place on one's own machine, where one has total control over the environment and software, rather than my situation, where my machine with A.T. is almost acting as a dumb terminal and Pine is doing the mail processing.


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

I've been doing some testing on this using pine with and without the utf8 patch. Basically, the answer is this....

Without the patch, your charset settings in pine, Absolute, and each email must match or some things won't display correctly without swapping options around. What a hassle!

With the patch, it's almost like magic. Everything displays properly. Not only the 8859-X variants, but also East-Asian texts. Set your TERM variable to en_US.utf8 and Absolute to UTF8 and pine will figure out the rest. Of course, each email must have the correct headers declaring it's character set in order for the conversion to be done, but this generally isn't a problem.

If your ISP will not install the patch, will they at least let you compile your own version of pine?

AbsoluteTelnet is doing everything it can to display what you want, but there are other pieces of the puzzle that need to be put in place.

Hope this helps.

Brian


   
ReplyQuote
(@bartlett22183)
Estimable Member
Joined: 21 years ago
Posts: 92
Topic starter  

Unfortunately, since I opened this thread there has been a drought of messages that we could test. Some messages have had charset utf-8 in their headers, but they have been nothing but 8-bit clean ASCII, so I can't forward anything to you.

As for compiling a "private" version of Pine, I supppose that might be possible, as my disk allotment with this ISP is fairly generous compared to other ISPs I have been with. (For what I am paying, it ought to be.) I can check and let you know. (But I have been running behind the last couple of days, running more hours at work than I am accustomed to.)


   
ReplyQuote
(@bartlett22183)
Estimable Member
Joined: 21 years ago
Posts: 92
Topic starter  

It has been several months since I started this thread, but I thought I would give an update. I got busy with other things, there was a drought of messages involved, and I more or less let it go.

I never asked the ISP support staff directly about installing a UTF-8 patch to Pine, but I rather doubt that they would do so with the "global" provided copy of Pine, as that would affect everybody. However, I have had some good results with this: I set the LANG variable in my shell (tcsh) to 'en_US.UTF-8' and and the LC_TYPE variable to the same value. Telling Pine, in its configuration file (.pinerc), that the character set encoding is UTF-8 and setting A.T. to UTF-8 now allows me to see UTF-8 characters to the extent of the glyphs provided in the font I am using (COURIER EXT in 10 point).

Unfortunately, this font does not cover all possible permutations of Unicode, and such a font would be horrendously huge, I suppose. However, if anyone knows of a decent-looking monospaced TrueType font that has a lot of the Unicode glyphs, I would be interested.

The shortcoming in all of this, as Brian pointed out, is that I need a combination of several settings at once: the A.T. setting and the Pine setting, at a minimum. In the course of time I see posts in ISO-8859-1, ISO-8859-2, UTF-8 (now), Windows 1252, and US-ASCII, sometimes all in a single session! In some instances I can just change the encoding in A.T. and get by, but it is admittedly something of a hassle. (For example, when I myself compose messages, I never use UTF-8, but I may use ISO-8859-1, in which case I need Pine set accordingly so it will put out the correct header.) Then again, Pine is not the most flexible mail and news agent on the block, but in my particular circumstance, it suits me well, which is why I started this thread in the first place.


   
ReplyQuote
Share: