[NTG-context] UTF conversion via Lua

Philipp Gesang gesang at stud.uni-heidelberg.de
Fri Feb 10 12:30:23 CET 2012


On 2012-02-10 12:11, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> ... Well, my information was not correct.
> 
> There are characters > 127 in the file, like "ř", "š"...
> 
> Each char = 1 byte, and as I'm using Windows with CP 1250, the characters are displayed correctly.

So it wasn’t ASCII after all ;-) No problem, just use iconv:

     iconv -f CP1250 -t UTF8 infile > outfile

I do this a lot with movie subtitles …

Hth, Philipp


PS: If you still insist on converting at the Lua end only then
    your starting point might be “regi-cp1250.lua” in the
    Context base/ dir.




> 
> But I have problem loading them into ConTeXt.
> 
> I need to convert the bytes > 127 to UTF sequence, which would be acceptable by ConTeXt.
> 
> @Thomas:
> 
> The table looks nice but there are no entries for CP 1250 to UTF conversion.
> 
> I prepared some tables: character conversion and removal of diacritics (see the attachment);
> maybe it would be handful to include them into ConTeXt somehow.
> 
> Best regards,
> 
> Lukas
> 
> 
> On Fri, 10 Feb 2012 11:57:32 +0100, Philipp Gesang <gesang at stud.uni-heidelberg.de> wrote:
> 
> >On 2012-02-10 11:22, Procházka Lukáš Ing. - Pontex s. r. o. wrote:
> >>Hello,
> >>
> >>I have many files with ASCII encoding; this encoding must be kept as these files are processed also by another program.
> >>
> >>When I work with them in ConTeXt, I need to convert them to UTF.
> >
> >Not needed, as every ASCII string is a valid UTF8  string:
> >   “The UTF encoding has several good properties. By far the most
> >    important is that a byte in the ASCII range 0-127 represents
> >    itself in UTF. Thus UTF is backward compatible with ASCII.”
> >    http://doc.cat-v.org/plan_9/4th_edition/papers/utf
> >You can use them in Luatex without further conversion.
> >
> >Regards
> >Philipp
> >
> >
> >>
> >>Does Lua (in ConTeXt scope) offer a transformation function or a table of chars [ASCII-code] -> [UTF-code] or anything to provide the conversion?
> >>
> >>Something like:
> >>
> >>\startluacode
> >>  local str = loadFile("a.txt") -- ASCII coded
> >>
> >>  str = context.ACSII2UTF(str) -- Or something like this
> >>\stopluacode
> >>
> >>Best regards,
> >>
> >>Lukas
> >>
> >>
> >>--
> >>Ing. Lukáš Procházka [mailto:LPr at pontex.cz]
> >>Pontex s. r. o.      [mailto:pontex at pontex.cz] [http://www.pontex.cz]
> >>Bezová 1658
> >>147 14 Praha 4
> >>
> >>Tel: +420 244 062 238
> >>Fax: +420 244 461 038
> >>
> >>___________________________________________________________________________________
> >>If your question is of interest to others as well, please add an entry to the Wiki!
> >>
> >>maillist : ntg-context at ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> >>webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> >>archive  : http://foundry.supelec.fr/projects/contextrev/
> >>wiki     : http://contextgarden.net
> >>___________________________________________________________________________________
> >
> 
> 
> -- 
> Ing. Lukáš Procházka [mailto:LPr at pontex.cz]
> Pontex s. r. o.      [mailto:pontex at pontex.cz] [http://www.pontex.cz]
> Bezová 1658
> 147 14 Praha 4
> 
> Tel: +420 244 062 238
> Fax: +420 244 461 038
> 
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context at ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://www.ntg.nl/pipermail/ntg-context/attachments/20120210/9b28d197/attachment-0001.pgp>


More information about the ntg-context mailing list