Forum

Arabic Script Bidir...
 
Notifications
Clear all

Arabic Script Bidirectionality and Letter Shaping

0 Posts
3 Users
0 Reactions
1,583 Views
(@digiflex)
Active Member
Joined: 55 years ago
Posts: 6
Topic starter  

Hi, we are looking for a Telnet terminal emulator that supports bi-directionality and letter-shaping, as we have opportunities to sell our software in countries which use the Arabic script. I downloaded your demo and selected the Win1256 script (in Options Properties Appearance Translation; I also tried ISO-8859-6).

When I set my Windows (XP Prof 5.1 SP 2) Language to Arabic, I then see the Arabic characters when I type in the VMS Host I've Telneted to. However the letters are coming out left to right, and are all in standalone format.

Is there a way to configure Absolute Telnet to handle the left-to-right direction, and also automatically change the letter shape according to its position in the word (initial, medial, final, standalone) the way Windows does? I suspect if the answer is "no", it would be something that would have to be done by the host in the Telnet session.

Thanks!

d.s.

[size=1][ May 06, 2008, 08:29 AM: Message edited by: Brian T. Pence ][/size]


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

Actually, you're in luck! The answer is YES. Absolute can handle RTL text and Arabic shaping. It does so through the use of the FRIBIDI library. There is a button on the toolbar labeled RTL. Press this and you should see the text re-ordered and shaped.

The 'Translation' setting you choose should match the character set of the host. Common choices for you would be ISO-8859-6, Win1256, or UTF8 depending on what character set the host supports.

DISPLAYING Arabic text in Absolute works well when the RTL option is enabled. EDITING text in an editor on the host may be a different matter, depending on what editor you use. If editing is not important, then skip this part.

Scenario one:
Using AbsoluteTelnet RTL and an editor that is not arabic-aware... The editor will assume that the text is visually displayed in a LTR manner, but since it is not, the cursor position will become confused and you'll edit data you did not intend to edit

Scenario two:
Using AbsoluteTelnet RTL and an editor that IS arabic-aware... The editor may reverse the text itself, assuming that the terminal will not. Absolute will then reverse it again, resulting in things being in the wrong order and editing again becomes confused

Scenario three (the solution) :
Use an arabic-aware editor and DISABLE Absolute's RTL. This will allow the editor to set the text order, cursor position, etc.... Editing should work as expected

In any case, I believe there is a workable scenario in there somewhere. I've put a lot of work into Absolute's international text capabilities and the Arabic ordering/shaping was added within the last year or so. Depending on what editor you use, you may have better or worse luck editing text. If you have issues, I'd love to work with you on trying to handle them in the most appropriate way. I don't know Arabic myself, so the capabilities that ARE there have come via requests from users such as yourself.

I look forward to your feedback on the current capabilities.

Regards,

Brian


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

I've gone back to test the RTL and shaping functions in 6.28 and I've come to realize that the shaping is not working.

Stick with 6.12 until I can track this down:

[url= http://www.celestialsoftware.net/telnet/AbsoluteTelnet6.12.exe ]http://www.celestialsoftware.net/telnet/AbsoluteTelnet6.12.exe[/url]


   
ReplyQuote
(@digiflex)
Active Member
Joined: 55 years ago
Posts: 6
Topic starter  

Thx Brian. We tried this and yes, the 6.12 is doing the shaping!
d.s.


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

Great!

Does it otherwise fit your needs as an arabic-enabled terminal package?

What will you be using it for?

Brian


   
ReplyQuote
(@digiflex)
Active Member
Joined: 55 years ago
Posts: 6
Topic starter  

We are still testing. We have successfully done some basic I/O tests, but now need to look into the specific I/O routines we use in our software, terminal entry and display, and printing of Arabic* characters, and mixed Arabic/Latin.

*Our first market opportunities actually require Farsi/Persian script, which is 90+% Arabic with a few extra letters and symbols. Have you considered adding support for Farsi?

Our software is geared towards the grocery/food warehousing/distribution and production industries. You can find more at:
[url= http://www.digiflex.ca/ ]http://www.digiflex.ca/[/url]
Our main host platform uses the VMS operating system, and most of our clients would use Telnet terminal emulators from PC's or Windows terminal sessions.

Dave Sills


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

Have you considered adding support for Farsi?

Dave,

I've never tested with Farsi data, but AbsoluteTelnet is written to handle most unicode data and displays most things pretty well. Of course, some scripts present additional challenges (such as ordering and shaping) that require additional processing.

There are no character set translations in Absolute that specifically handle legacy Farsi data, but if your data is in UTF8 format, it will at least try.

I'll do some testing and let you know what I find out.


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

Dave,

I've been doing some testing with Farsi data and things look pretty good. It's close enough to Arabic that it works pretty much the same, with additional characters as you said. In my sample data, only one character didn't display, but that's mostly a font issue. I would assume that the PCs used by a Farsi-speaking user would have a selection of Farsi fonts that I don't have, including fixed-width ones required by the terminal client.

Brian


   
ReplyQuote
(@digiflex)
Active Member
Joined: 55 years ago
Posts: 6
Topic starter  

Thx Brian,

Actually, we are trying to avoid Unicode for various reasons. One being that our applications databases use fixed-length text fields, and storing unicode would require 2 data characters per printed character.

So far we're satisfied with the Arabic capabilities of Absolute, and are starting to look for Farsi fonts, e.g. at
[url= http://instruct1.cit.cornell.edu/Courses/nes115/farsiabroad.htm ]http://instruct1.cit.cornell.edu/Courses/nes115/farsiabroad.htm[/url]
We're not sure yet whether there are non-Unicode ones available, but it would make sense that some of the non-unicode Arabic scripts could be easily modified for Farsi.

d.s.


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

You can avoid UTF8 if you wish. Win1256 seems to cover both Arabic and Farsi characters. Just set Win1256 in the AbsoluteTelnet options panel and you should be good.

As for fonts, Unicode fonts would be preferred. Internally, AbsoluteTelnet supports only unicode data. The font selection has no effect on the data received from or sent to the host. AbsoluteTelnet will make the proper conversions. The tricky part about fonts is finding one that is fixed-width that the terminal can use.

Searching.....

Brian


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

I installed the Persian language interface pack to try to get some additional fonts (Fixed-width persian fonts are not easily found). To my surprise, Windows rebooted and the entire interface was now in Persian! It took me a minute to uninstall it, but I'm back to normal now.

Still looking for fonts...

Brian


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

Dave,

I found and fixed the problem in 6.28. I had incorrectly linked to Fribidi 1 which does not include shaping. 6.28 corrects this by linking to Fribidi2.

Beta Testing

This post was modified 7 months ago by bpence

   
ReplyQuote
(@digiflex)
Active Member
Joined: 55 years ago
Posts: 6
Topic starter  

Thanks Brian -- I've updated to 6.28 and continued testing. I'm having a few Windows (not Absolute Telnet) issues with Arabic/Farsi script showing up in some interesting locations, and this is inconvenient since I am neither an Arabic nor Farsi reader, although I do know the basics of the alphabets.

We can see/generate the Farsi characters using Win1256, however these appear to be an afterthought, as they are not in the proper Farsi alphabetic sequence, and this would affect sorting. So we would be interested if/when you can find an appropriate fixed-width true Farsi font that could be used within Absolute Telnet.

In case you're interested, the following link shows the alphabet, and comparing the sequence to Win1256, the sequences for these (decimal) Ascii values for Farsi characters should be sequenced as follows:
129 should be right after 200
141 " " " " 204
142 " " " " 210
144 " " " " 223
152 should be right after (or possibly replace) 223
192 appears to be the same as 229, and this/these should come after, not before 230.

[url= http://www.geocities.com/athens/academy/9594/farsi.html ]http://www.geocities.com/athens/academy/9594/farsi.html[/url]

One other thing that's missing from Win1256 is the numeric digits that are used within this script (I think the Arabs call them "Indian numerals", and they aren't our digits which for some reason we call "Arabic numerals").

One other question -- is there a programmatic way within Absolute Telnet to select character sets -- e.g. having our software use Escape Sequences to switch between different fonts?

Thx!

d.s.


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

Dave,

Sorting is not something that is handled within AbsoluteTelnet. AbsoluteTelnet is just the display. An application that handles Farsi (your app written on the host) would have to know the proper way to sort items, then tell AbsoluteTelnet what to display where. Win1256 is just the way to communicate to AbsoluteTelnet which characters to display. You could just as easily use ISO-8859-6, but you would probably have the same problems.

Of course, as in most english encodings, it is convenient when the character encoding encodes characters in the proper sort order, but this does not always happen. There are internationalization libraries that can help.

As for the numeric digits... If they don't exist in Win1256, then you're out of luck. You may try to look for a character encoding that encodes all of the arabic/farsi characters as well as the digits. Or, you could consider UTF8, which encodes ALL characters.

Another thing.... Don't confuse a "font" with a "character set" . The "character set" defines which characters can be encoded. For example, as you notice, the Win1256 character set does not include the digits. You'll have to pick a different character set if you want those.

A font is a collection of character glyphs and can encompass many encodings and languages. For example, the font I've found to display arabic/farsi the best on my machine is "Courier New". However, it is not only suitable for Arabic. It has glyphs for Latin, Cyrillic, Hebrew, etc....

What you're probably looking for is a character encoding that encodes farsi characters in their native ordering. I'll help you look.

Brian


   
ReplyQuote
(@bpence)
Member Admin
Joined: 1 year ago
Posts: 1375
 

Dave,

I've been doing a lot of reading on this subject, and I've come to the conclusion that sorting is not going to be easy regardless of the character set you choose.

You may want to have a look at the [url= http://en.wikipedia.org/wiki/Iran_System_encoding_standard ]IRAN SYSTEM[/url] code page, as it appears to cover the characters you need (in binary order) AbsoluteTelnet does not currently support this character set, so it would have to be added.

Another way to handle this may be to take the plunge and convert to unicode, then use the Unicode Collation Algorithm to sort. Not only will this help you with your Arabic and Persian, but all other languages as well, as you will be able to reuse the same code with multiple languages and character sets. You may at least find interesting reading here:

[url= http://www.unicode.org/reports/tr10/ ]http://www.unicode.org/reports/tr10/[/url]

Pay particular attention to section 1.8 (Common misperceptions)

Item 2:

Collation is not code point (binary) order. The simplest case of this is capital Z versus lowercase a. As noted above, beginners may complain about Unicode that a particular character is “not in the right place in the code chart”. That is a misunderstanding of the role of the character encoding in collation. While the Unicode Standard does not gratuitously place characters such that the binary ordering is odd, the only way to get the linguistically-correct order is to use a language-sensitive collation, not a binary ordering.

While this quote is specifically referencing Unicode, I think the same thing can be said of Win1256, as reflected in your comments about the placement of the Farsi characters.

Even in English sorting, we sometimes have to ignore binary order to get things to sort the way we want. For example, binary order would sort 'A'(x40) -> 'Z'(x5A), then 'a'(x61) -> 'z'(x7A). If you want a case-insensitive sort, it's more tricky A(x40), a(x61), B(x41), b(x62), C(x42), c(x63).

All of this may be confusing, I know, but here are a few things I know you need to decide:

1. You MUST pick a character set to store your data. It may be Win1256, Iran System, Unicode, or some other character set, but you must pick one and stick to it. It should adequately cover the characters you want to represent. If you're going to have to exchange data with other vendors, it may be a good idea to use the character set used by the majority of your vendors.

2. In order for AbsoluteTelnet to show your data properly, AbsoluteTelnet must support the character set. If you decide to use a character set that is not supported by Absolute, it will have to be added (not a difficult task usually)

3. Unless the character set selected in (1) presents the characters in the proper sorting order, sorting will be a chore.


   
ReplyQuote
Page 1 / 2
Share: