[NTG-context] unicode and out-of-box usability
Hans Hagen
ntg-context@ntg.nl
Sat, 03 Jan 2004 23:38:02 +0100
At 18:59 02/01/2004, you wrote:
>I've been struggling through, trying to learn Unicode in ConTeXt. It's
>been instructive, at least. (Hope to make a MyWay about it...)
good
>There are a few weird things that made it difficult to learn, and I was
>wondering if someone could help explain why things are the way they are.
>
>In unic-ini:
>\chardef\utfunihashmode=0 % 1 = enabled
>
>Actually, if I understand things correctly, '1' means "disabled", which
>is what I preferred, having not yet created any unicode vectors. So the
>internal documentation there seems wrong, and I would argue the default
>case (0) makes it harder for beginners.
hm, did you look at the unic-001 etc files? the trick is in fast and efficient
expansion without the need to define lots of named glyphs
>More confusingly, in font-uni:
forget about that one, although it's called unicode, it's actually a
mechanism for
the many vectors derived from unicode / related to unicode but not entirely
i.e. cjk fonts
>\def\enableunicodefont#1%
> {\definefontsynonym[\s!Unicode][\getvalue{\??uc#1\c!file}]%
> \def\unicodescale {\getvalue{\??uc#1\c!schaal}}%
> \def\unicodeheight {\getvalue{\??uc#1\c!hoogte}}%
> \def\unicodedepth {\getvalue{\??uc#1\c!diepte}}%
> \def\unicodedigits {\getvalue{\??uc#1\c!conversie}}%
> \def\handleunicodeglyph {\getvalue{\??uc#1\c!commando}}%
>%%%%%%%%%%% NEXT LINE
> \enableregime[unicode]% the following \relax's are realy needed
> \doifvalue{\??uc#1\c!interlinie}\v!ja\setupinterlinespace\relax
> \getvalue{\??uc#1\c!commandos}\relax}
>
>The \enableregime[unicode] runs in direct opposition with the
>\enableregime[utf] that normally goes at the start of (some of my)
>documents. As it stands, with the regime hard-coded, users have to put an
>\enableregime[utf] *after* the font declaration. That's awkward.
so, don't use that mechanism, stick to the utf mechanism
>The last proposed change/complaint is back in unic-ini, and came from my
>attempts to match the main body font with the unicode font.
>
>\def\utfunifontglyph#1%
> {\xdef\unidiv{\number\utfdiv{#1}}%
> \xdef\unimod{\number\utfmod{#1}}%
> \ifnum#1<\utf@i
>%%%% \unicodeasciicharacter\unimod
> \char\unimod % \unicodeascii\unimod
> \else\ifcsname\@@univector\unidiv\endcsname
> \csname\doutfunihash{\unidiv}{#1}\endcsname
> \else % so, these can be different fonts !
> \unicodeglyph\unidiv\unimod % no \uchar (yet)
> \fi\fi}
>
>Basically, I'd like to use the \unicodeasciicharacter hook with this
>definition:
>
>\def\unicodeasciicharacter{\uchar{0}}
>
>(I'm not certain the above is release-quality code, but I've been testing
>it with a stripped down \utfunifontglyph that should be functionally
>equivalent.)
play with it and we'll see
>Working with the unicode code makes me appreciate that it's really
>powerful part of ConTeXt. Thanks, Hans!
how about the following:
there are many font encodings around but none is really complete enough to
deal with basic unicode (0/1/2 range)
why not define a new font encoding with characters only so that we can have
as many chars as needed in a 0-255 vector, all those
special characters (registered, and so) are (1) used seldom, (2) not
related to hyphenation and kerning; it is also a way to get
rid of some 'ligatures' like --- becoming an emdash (in context and xml we
can conformtably directly call symbols, and these may
come from a different instance of the font
Hans