Forum

Arabic Script Bidir...
 
Notifications
Clear all

Arabic Script Bidirectionality and Letter Shaping

0 Posts
3 Users
0 Reactions
1,064 Views
(@bpence)
Member Admin
Joined: 12 months ago
Posts: 1375
 

Found another web page that lists four possible character sets (Iran System, IRNA, ISIRI2900, ISIRI3342)

<broken link removed>

Many of the character set descriptions were written by Roozbeh Pournader. It may be worth trying to contact him for a recommendation on which to use. 

This post was modified 5 months ago by bpence

   
ReplyQuote
(@bpence)
Member Admin
Joined: 12 months ago
Posts: 1375
 

I spoke with Behdad Esfahbod about this. He's the author of Fribidi and a contributor to FarsiWeb. He ONLY recommends UTF8 these days. The increased storage requirements should not be a major concern, especially considering the capacity and speed of modern storage devices.

He also recommended a good font: DejaVu Sans Mono:

[url= http://dejavu.sourceforge.net/wiki/index.php/Download ]http://dejavu.sourceforge.net/wiki/index.php/Download[/url]


   
ReplyQuote
(@digiflex)
Active Member
Joined: 55 years ago
Posts: 6
Topic starter  

Brian, we're back on this project again. Thank you for your research and advice so far.

Our contacts also recommend UTF-8. So, in an attempt to kill another bird with the same stone, we are asking if there is any way Absolute Telnet could do an internal translation between UTF-8 2-byte characters to 1-byte (0:255) values?

We still strongly prefer to be able to store a character in a byte. Although, as you've pointed out, storage space is cheap, we have a large and complex system with thousands of text fields of fixed length. For example, a product name is stored and limited to 36 characters. Having that represent only 18 characters wouldn't be acceptable, and the database redesign work to double the size of all our text fields would be massive. Hence if the screen I/O were in UTF-8 but we were able to store a 1-byte translation of this, it would be ideal.

2nd question -- we notice on our trial copy there is a greyed-out SFTP button. Is there full support for standard FTP?

Thx!

d.s.


   
ReplyQuote
(@bpence)
Member Admin
Joined: 12 months ago
Posts: 1375
 

is any way Absolute Telnet could do an internal translation between UTF-8 2-byte characters to 1-byte (0:255) values? We still strongly prefer to be able to store a character in a byte.

I believe you may have a misunderstanding about the terminal's role in your application's design. Absolute doesn't care how you store things, or how you sort things, convert things, etc. It simply takes the data in any of the supported formats (your choice) and displays it. If your database only supports single-byte characters, then you're stuck with one of the single-byte legacy encodings as we discussed before. This approach will probably require you to create a custom sorting algorithm, as sorting by binary values will give the wrong order. So be it. That may just be work you can't avoid. The downside to this is that as you add additional languages, you'll have to support additional legacy encodings and special sorts, etc. This is the kind of work that Unicode was designed to help you avoid.

There isn't any algorithm that can take 36 unicode characters and store it in 36 bytes unless you convert them to some single-byte legacy encoding. Then, you're back to square one.

Typically, a Unicode application does not store data internally in UTF8 format. UTF8 is *not* a 2-byte encoding. It is a variable length multi-byte encoding that can store a character in 1, 2, 3, or even 4 bytes! This variability makes it a poor choice for data storage, as it becomes very difficult to determine the lengths of strings necessary to store a certain number of characters or how many characters can fit in your 36-byte field. Applications tend to store data in the UCS2 format where every unicode character takes exactly two bytes. Of course, this requires quite a bit of application modification and extra storage as you said. However, once this work is done, adding new languages is trivial.

Regardless of how you store it, when you send the data to the terminal, it has to be in one of the encodings the terminal supports (Win1256, ISO8859-6, etc)


   
ReplyQuote
 kim
(@kim)
New Member
Joined: 12 years ago
Posts: 3
 

I am having problems displaying Arabic Bidirectional text in your telnet client 9.53 window. I've tried with RTL on and without RTL on but the bidirectional text does not display correctly. Each line of the text is written to telnet via WriteConsoleA (using UTF-8 encoding):

Line 1 - Correctly written RTL in telnet client:
الوظيفة الرئيسية (|

Line 2 - Incorrectly written - "FK)" is written left adjusted, followed by left adjusted Arabic:
FK) اختبار |

See FKtest-TelnetIncorrectlyDisplaysBidiLine2-leftAdjusted.png for incorrect display by telnet.
See FKtest-BrowserCorrectlyDisplaysBidiLine2-rightAdjusted.png for correct display by Browser (set to RTL).

How can I get the bidirectional text (line 2) to be right adjusted, with the Arabic written to the "left" of the English text? I've tried sending the Unicode RLO. LRO and PDF formats to the telnet client but then the telnet client displays garbled text for these characters. See FKtest-TelnetIncorrectlyDisplaysBidiLine2withRTO-PDF-GarbledChars.png.

I've also tried programmatically positioning the cursor prior to sending the text w/ RTL ON/OFF, but when OFF, telnet does not shape the Arabic characters correctly, when ON, telnet overides the cursor positioning and still left adjusts the BIDI text. See FKtest-TelnetIncorrectlyDisplaysBidiLine2AfterRTLOFFmanualCursorPositioning-ArabicShaping.png.

Is there a way to programmatically to turn ON/OFF the RTL feature, so that for instance, I can position the cursor right adjusted to display the English, and then re-position the cursor for the RTL text after turning RTL on to display the Arabic? Thanks!

[file name=FKtest_TelnetIncorrectlyDisplaysBidiLine2_leftAdjusted.zip size=95488] http://www.celestialsoftware.net/images/fbfiles/files/FKtest_TelnetIncorrectlyDisplaysBidiLine2_leftAdjusted.zip [/file]


   
ReplyQuote
(@bpence)
Member Admin
Joined: 12 months ago
Posts: 1375
 

Can you post a zip file with the actual text?

I'll need that to do some testing.

Brian


   
ReplyQuote
 kim
(@kim)
New Member
Joined: 12 years ago
Posts: 3
 

I'm attaching the Absolute log file, with the actual text at the top of the file.

Thanks!
[file name=AbsoluteTelnet.zip size=406] http://www.celestialsoftware.net/images/fbfiles/files/AbsoluteTelnet.zip [/file]


   
ReplyQuote
 kim
(@kim)
New Member
Joined: 12 years ago
Posts: 3
 

Hi Brian,

Any update on this issue for Arabic BiDi Support?

Thanks!


   
ReplyQuote
(@bpence)
Member Admin
Joined: 12 months ago
Posts: 1375
 

The problem is that Absolute is trying to do a context sensitive decision on when to do a full or partial RTL on the data. The second line starts with English text, so the rules decide that only the Arabic text on the line should be affected.

Some things to remember.

1. AbsoluteTelnet does *not* understand the Unicode LRO or RLO characters. At least not yet 🙂
2. Cursor positioning occurs within the 'logical' ordering of the data, not the visual ordering. I would think that trying to set the cursor position within a mixed string of RTL and LTR text would be problematic.
3. AbsoluteTelnet uses FriBIDI for the implementation of the BIDI algorithm. Some behavior is inherent in the tool and not under my control.
4. AbsoluteTelnet does *not* pretend to be a 100% accurate text rendering engine. It does the best it can with what it knows (to the best of *my* ability). It gets better with time, as long as I get good feedback and help from users like you!

I'm working on an update that may help. Essentially, I'll scan the entire line and if the line has any RTL characters, I will treat the entire line as RTL. This overrides the context-sensitive behavior I mentioned before.

I'll let you know when I have something for you to look at.

Brian


   
ReplyQuote
(@bpence)
Member Admin
Joined: 12 months ago
Posts: 1375
 

Give this a try:

<old link removed.  Go to downloads for the latest version>

It's a bit different behavior than before, applying full RTL if the line contains any RTL data. Brian

This post was modified 5 months ago by bpence

   
ReplyQuote
Page 2 / 2
Share: