[NTG-context] <mtext> UTF further problems

James Cloos cloos at jhcloos.com
Tue Jun 21 14:55:09 CEST 2005

>>>>> "Duncan" == Duncan Hothersall <dh at capdm.com> writes:

>> Are you sure your
>> file is in utf-8 and not, eg, utf-16?

Duncan> I was, but I'm no longer sure of anything. :-) Is there a
Duncan> foolproof way of finding out?

(First, I cannot comment usefully wrt this topic and windows.)

Try this at a shell prompt:

    env LANG=C LC_ALL=C cat --show-all FileName

where FileName is the file in question.  The non-ascii characters will
be output as strings that look M-? where ? is a single ascii character.
If you see a single M-? triplet in place of each non-ascii character
you do not have utf-8.  If you see between two and five such triplets
for each non-ascii character in the document it is probably utf-8.
(If you see ^@ pairs separating the ascii chars you have utf-16.)

Of course, context would not be able to deal with utf16 on linux;
tex would just get confused by the interspersed NULLs (represented
as ^@ in the --show-all output described above) in the initial lines.

So if it is an encoding problem, it is more likely that you are ending
up with a file in one of the iso8859 8-bit encodings.  

A (not-so-?)quick test is this.  Save it w/o the leading blanks
and run it, passing a filename as a single argument.

  # change foo.tex in the next line to your filename
  for ij in $(seq 1 15); do
      iconv -f iso8859-${ij} -t utf8 <$1 >from-${ij}-$1 && \
          texexec from-${ij}-$1

Then test all of the generated dvi files to see whether any worked.

Duncan> I tend to use emacs, which I thought was a pretty safe bet,
Duncan> but maybe I should try something else?

I also use emacs, but from cvs.  (Gentoo has an emacs-cvs ebuild that
makes that easy.)  I also run with LANG=en_US.UTF-8 and several
settings in emacs to prefer utf8.  The emacs-unicode-2 branch in CVS
(what will become emacs-23; CVS HEAD will become emacs-22) is even
better for this since it uses unicode as its internal representation.

Duncan> I'm testing on both Windows and (Redhat) linux, both with the
Duncan> current minimal ConTeXt installations (i.e. mswintex.zip and
Duncan> linuxtex.zip). They exhibit the same behaviour.

I've only tested on tetex-3.  That may make a difference....

You may want to give TeX-Live a test.

James H. Cloos, Jr. <cloos at jhcloos.com>

