[NTG-context] Character names (was: Context 2005.12.19 released)

Mojca Miklavec mojca.miklavec.lists at gmail.com
Fri Dec 23 02:00:12 CET 2005


Taco Hoekwater wrote:
>
> Here's what I can come up with. At least a few are acceptable, like the
> horizontal bar. \textnumero exists, but is only reachable in cyrillic
> encodings (fixable, I guess?), and the greek & vietnamese accents
> are also only usable in the correct encoding. I've used the \text...
> versions of the accents, but perhaps the actual commands are more
> correct (like \' and \~).
>
> Cheers, Taco
>
> \starttext
> \definecharacter texthorizontalbar {{--\kern 0pt--}}
> \definecharacter textdong          {\underbar{\dstroke}}

Thanks for those ...

> \NC 0300 COMBINING GRAVE ACCENT \NC \textgrave           \NC \NR
> \NC 0309 COMBINING HOOK ABOVE   \NC \texthookabove       \NC \NR
> \NC 0303 COMBINING TILDE        \NC \texttilde           \NC \NR
> \NC 0301 COMBINING ACUTE ACCENT \NC \textacute           \NC \NR
> \NC 0323 COMBINING DOT BELOW    \NC \textbottomdot       \NC \NR

I may be wrong, but aren't those used only in combination with other
characters? I don't know if TeX (ConTeXt) can handle this (at least
not yet). When I wrote the list a couple of days ago I forgot about
that fact. If the accent would come before the charecter, this could
be replaced by "\buildtextaccent...", but here there's perhaps no
solution without some additional macros. (And since the Vietnamese
seem to be satisfied with viscii and utf for now, supporting cp1258 is
not crucial.)

I double-checked the differences between the existing regimes and the
ones that were automatically produced by a script. The list of regimes
that are "ripe" for supporting is thus:

cp125[ 0 | *1 | *2 | 3 | 4 | 7 ]
iso-8859-[ *1 | *2 | 3 | 4 | *5 | *7 | 9 | 13 | *15 | 16 ]
*viscii (with glyph names instead of \"\u\...)

(The ones marked with a star are already supported, perhaps with some
inconsistencies. Not supported: Hebrew, Arabic, Vietnamese? for cp125X
and Arabic, Thai and Celtic for iso-8859-X.)

I'll send the files (full content is already on my page), but I need
to know how to split/group them (I guess it would be a bad idea to
have one file for each encoding). Should there be one file for
iso-8859 and one for windows encodings? What about those regimes that
are already supported? I would like to move at least the "regi-win"
(with 8 wrong definitions anyway) to a "less discriminating" place,
don't know what to do with Greek and Cyrillic.

And another set of questions:
1. Can someone check for (in)consistencies for
greekupsilondiaeresis vs. greekupsilondialytika?
Looks like the same glyph named differently at different places
(functionality may break).

2. What to do with
{\cyrillicGJE}       {\'\cyrillicG} % 0403 CYRILLIC CAPITAL LETTER GJE
{\cyrillicgje}       {\'\cyrillicg} % 0453 CYRILLIC SMALL LETTER GJE
{\cyrillicKJE}       {\'\cyrillicK} % 040C CYRILLIC CAPITAL LETTER KJE
{\cyrillickje}       {\'\cyrillick} % 045C CYRILLIC SMALL LETTER KJE
{\cyrillicgheupturn} {\cyrillicgup} % 0491 CYRILLIC SMALL LETTER GHE WITH UPTURN
Which variant is better?

Would it make sense to define
\definecharacter cyrillicGJE {\buildtextaccent\textacute\cyrillicG}
\defineaccent ' \cyrillicG {\cyrillicGJE}
and then use \cyrillicGJE consistently?

3.
PLEASE FIX:
in enco-def.tex replace \cdots by something (\dots, I suppose, but I'm not sure)
\definecharacter textellipsis     {\mathematics\cdots}
(I guess this "bug" was the reason for changing some definitions in
regimes/encodings elsewhere.)

Should \textellipsis be used for "2026 HORIZONTAL ELLIPSIS" or anything else?

4. \softhyphen, \hyphen or \- for "00AD SOFT HYPHEN"?

5. Urgently: what to do with quotations (without language
discriminations if possible)?

% 201A SINGLE LOW-9 QUOTATION MARK
\quotesinglebase vs. \lowerleftsingleninequote
% 201E DOUBLE LOW-9 QUOTATION MARK
\quotedblbase vs. \lowerleftdoubleninequote
% 2018 LEFT SINGLE QUOTATION MARK
\quoteleft vs. \upperleftsinglesixquote
% 2019 RIGHT SINGLE QUOTATION MARK
\quoteright vs. \upperrightsingleninequote

% 201C LEFT DOUBLE QUOTATION MARK
\quotedblleft vs. \upperleftdoublesixquote
% 201D RIGHT DOUBLE QUOTATION MARK
\quotedblright vs. \upperrightdoubleninequote

% 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
\guilsingleleft vs. \leftsubguillemot
 % 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
\guilsingleright vs. \rightsubguillemot
% 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
\leftguillemot vs. \greekleftquot
(are Greek quotations treated specially or what is this doing in regi-grk?)
% 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
\rightguillemot vs. \greekrightquot vs. \prewordbreak\rightguillemot
(in my point of view the last one may be better, but not fair since
it's language dependent: may be OK for French, but not for German or
vice versa; perhaps a language-sensitive macro could be inserted at
this place?)

6. \textnumero, 0x2116 (and perhaps some other characters) should be
added to unicode vector 33.

7. files regi-il1 and regi-win have many inconsistencies. I would like
to suggest to do the following renamings:

windows -> cp1252
il1 -> iso-8858-1
il2 -> iso-8858-2
iso88595 -> iso-8858-5
grk -> iso-8859-7 (the new one)

and to add the following lines somewhere:

% or perhaps the other way around
\defineregimesynonym[utf-8][utf]
\defineregimesynonym[utf8][utf]

\defineregimesynonym[windows-1250][cp1250]
\defineregimesynonym[windows-1251][cp1251]
\defineregimesynonym[windows-1252][cp1252]
\defineregimesynonym[windows-1253][cp1253]
\defineregimesynonym[windows-1254][cp1254]
%defineregimesynonym[windows-1255][cp1255] % not supported yet (Hebrew)
%defineregimesynonym[windows-1256][cp1256] % not supported yet (Arabic)
\defineregimesynonym[windows-1257][cp1257]
%defineregimesynonym[windows-1258][cp1258] % not supported yet (Vietnamese)

% for historical reasons
\defineregimesynonym[windows][cp1252]

% 5 - Cyrillic
% 6 - Arabic (not supported)
% 7 - Greek
% 8 - Hebrew (3 signs missing)
% 11 - Thai (not supported)

\defineregimesynonym[il1][iso-8859-1]
\defineregimesynonym[il2][iso-8859-2]
\defineregimesynonym[il3][iso-8859-3]
\defineregimesynonym[il4][iso-8859-4]
\defineregimesynonym[il5][iso-8859-9]
\defineregimesynonym[il6][iso-8859-10]
\defineregimesynonym[il7][iso-8859-13]
%defineregimesynonym[il8][iso-8859-14] % not supported yet
\defineregimesynonym[il9][iso-8859-15]
\defineregimesynonym[il10][iso-8859-16]

\defineregimesynonym[latin1][iso-8859-1]
\defineregimesynonym[latin2][iso-8859-2]
\defineregimesynonym[latin3][iso-8859-3]
\defineregimesynonym[latin4][iso-8859-4]
\defineregimesynonym[latin5][iso-8859-9]
\defineregimesynonym[latin6][iso-8859-10]
\defineregimesynonym[latin7][iso-8859-13]
%defineregimesynonym[latin8][iso-8859-14] % not supported yet
\defineregimesynonym[latin9][iso-8859-15]
\defineregimesynonym[latin10][iso-8859-16]

% for historical reasons
\defineregimesynonym[iso88595][iso-8859-5]
\defineregimesynonym[grk][iso-8859-7]

I can send the new files as soon as it gets clear how to group them.
If additionalz the rest of the questions are answered, then new files
can become more consistent without breaking anything.

Sorry for the long mail,
    Mojca


More information about the ntg-context mailing list