[NTG-context] Arabic-utf-8 (plus a sample)

Thomas A. Schmitz ntg-context@ntg.nl
Sat Jun 5 22:48:18 CEST 2004


Just a quick reply (it's bedtime over here): there may be 2 problems. 1
is  that the mail program put in an unwanted linebreak after the =~
part, just remove it; it should all be one line. And then: you'll need a
fairly recent version of perl for it to work, what do you get when you
do
perl --version
I guess for utf to work, it should be at least 5.8.0. Your basic idea of
the usage is right (I'm not a windows person, but I  assume it should be
the same): save the scipt as utf2tex.pl, make it executable and call it
as utf2tex.pl FILENAME.txt.

I guess it would be easiest to convert the utf to ascii directly - that
would mean you could later convert it back. I have a set of scripts that
do just that -- convert babel Greek into utf-8 and back.

If you need more help, I'll look into it tomorrow!

Best

Thomas

On Sat, 2004-06-05 at 23:33, Idris Samawi Hamid wrote:
> On Sat, 05 Jun 2004 22:41:39 +0200, Thomas A. Schmitz 
> <thomas.schmitz@uni-bonn.de> wrote:
> 
> > Idris,
> >
> > I know a bit of perl and would love to help. However, I fear that
> > sending us your stuff via mail will be a bit difficult because the utf-8
> > chracters get transformed into gibberish.
> 
> Thnx 4 such a speedy reply! I don't think you are getting gibberish 
> though; you should be getting the extended ascii representation. So the 
> letter alif (hex 0627) should look like this:
> 
> ا
> 
> Do you get a forward-slashed circle and a section symbol? If so, that's 
> the ascii representation I'm trying to convert to the letter `A'.
> 
> Here are the codes you want:
> 
> ا [0627] => A
> 
> ب [0628] => b
> 
> ج [062C] => j
> 
> د [062F] => d
> 
> Ù‡ [0647] => h
> 
> Ùˆ [0648] => w
> 
> ز [0632] => z
> 
> Let me explain my situation more clearly:-)
> 
> I have a unicode editor, Unitype Global Writer. I save a unicode document 
> as a utf *.txt file. When I open that saved file in my TeX editor 
> (WinEdt), it comes out as extended ascii (that's the "gibberish"). So what 
> I wanted to do was convert the ascii "gibberish" to my Latin 
> transcription. It seems that what you are suggesting is to use the hex 
> representation and convert the unicode txt file into a Latin transcription 
> file directly and bypass the gibberish.
> 
> On your perl file, can you give me an example of how to use it? I tried 
> (in windows, with name
> utf2tex.pl and unicode text in unicode-utf.txt) and get
> 
> =========================
> > perl utf2tex.pl unicode-utf.txt
> Unknown discipline class ':utf8' at C:/Perl/lib/open.pm line 18.
> BEGIN failed--compilation aborted at utf2tex.pl line 4.
> =========================
> 
>  from your script I tried, e.g.
> 
> ============================
> $_ =~
> s/\x{0627}/\x{0041}/esg;
> # from alif to `A'
> ============================
> 
> Your guidance will be greatly appreciated!
> 
> Thnx a million!
> Idris




More information about the ntg-context mailing list