hebrev() and friends
December 11th, 2006 | by admin | suraski.netRecently Andrei approached me and asked what I think should be done with hebrev() and hebrevc() in the context of the Unicode-enabled PHP 6. For those of you who don’t know, hebrev() (which stands for Hebrew Reverse) and hebrevc() (which stands for Hebrew Reverse & Convert) are remnants from the early days of the Web - where browsers could not handle Right-to-Left langauges properly. For those of you who don’t know what Right-to-Left languages are, they’re simply enough languages that are written from right to left, such as Hebrew and Arabic.
In those dark ages - if you wanted your text to render properly on a browser that’s completely ignorant of the fact your text should be going from right to left - in addition to right alignment, you also had to write your text in reverse. Kind of like, in order to type “Hello World”, you had to type “dlroW olleH” (or rather, instead of “שלום עולם” one had to type “םלוע םולש”). This concept was named “Visual displaying”, or in short just plain “Visual”. Since most other systems (other than browsers) were generally saving their data in the correct first-char-comes-first order (simply enough called “Logical”), and since input coming from users using browsers also came in the correct ‘logical’ order, there was a gap between the data Web apps were working with, and the data they had to display. hebrev() and hebrevc() were created to bridge that gap, and reverse Visual hebrew into Logical and vice versa. Doing that is probably much more complicated than you think as you have to make sure not to reverse numbers or non-Hebrew text, break lines properly if wrapping was requested, etc.
As time passed by - more and more browsers got the ability to handle Logical text, and automatically display it properly and render it from right to left. My belief is that most of the R2L web today is already using Logical - which brings me back to Andrei’s query, where I’d like to ask your feedback.
Do you think we need to update hebrev() to support Unicode, or should we just let it rest in peace? Should we may be expose the more powerful ICU bidirectional conversion API under new names for those who wish to still reverse Unicode text from Visual to Logical and vice versa? I realize this is probably not very interesting to most of the crowd here, but those of you that do have an opinion - your feedback is welcome!
