[dev-context] Improved support for Norwegian in ConTeXt

Mojca Miklavec mojca.miklavec.lists at gmail.com
Sun Feb 4 18:16:40 CET 2007


I would suggest you to post some of the questions to the ntg-mailing
list, where more Norwegian users can comment on it. When doing some
changes, the 100% backward compatibility might need to be sacrificed a
bit, bot some changes are worth doing so if others agree and if it's a
contribution towards a better quality. (I CC-ed to two users who seem
to have contributed or complained a bit ;)

On 2/4/07, Karl Ove Hufthammer wrote:
> I'm writing this to suggest improvements in ConTeXt's support for the
> Norwegian languages. ConTeXt already has rudimentary support for Norwegian,
> but with some problems.
>
>
> Language codes
> --------------
>
> The main problem is that ConTeXt use the language code 'no' for Norwegian.
> There actually *is* no written language called 'Norwegian'; Norway has two
> official written languages, Norwegian Bokmål (ISO 639 language code 'nb') and
> Norwegian Nynorsk (ISO 639 language code 'nn'). The current definitions
> for 'no' in ConTeXt is for Norwegian Bokmål. (There is a ISO 639 language
> code 'no' for Norwegian, but this should usually be used for spoken
> Norwegian, or perhaps for transcriptions of spoken language.)
>
> The language code 'no' should be removed, and be replaced by the two language
> codes 'nb' and 'nn'.

Although I don't know the exact situation, a few remarks:

- You should probably also provide the correct definitions for calling
the language (so that one can say \mainlanguage[norwegian], but
perhaps with what you consider to be the proper language tags). It's
currently

\installlanguage [norwegian]   [\s!no]
\installlanguage [norsk]       [\s!no] % bonus switch

You need to fix the two and perhaps add
\installlanguage [???]       [\s!nb]
\installlanguage [???]       [\s!nk]


- If you remove [no], older documents might break. I don't know much
about the situation and the number of users, but can you say which of
the two language variants [no] should default to? Since the current
definitions probably point to "nb" (from the first blick) - would it
make sense to use "nb" when one says \mainlanguage[no]?

Perhaps one can issue a warning when the language "no" is selected
(statig something like "language 'no' is deprecated, please use 'nb'
for Bokmål or nn for Nynorsk instead")

I also asked to replace "si" by "sl" for Slovenian some time ago, but
that was when there was no support for Slovenian yet and "si" stands
for Singhalese (whatever that is).

For Norwegian the situation might be slightly different since "no"
still means Norwegian, but I don't know how "offensive"/"ignorant" it
sounds to you if that one is used.

Removing it probably doesn't affect the rest, so if other Norwegian
users agree to remove it completely, it can still be done, but I would
suggest you to ask the author of the original translations and the
rest of users on the ntg-context mailing list first. Otherwise it can
still default to one of the two varians (or to a new one if you
provide also the third alternative for the "spoken language").

> See http://en.wikipedia.org/wiki/Norwegian_language for a (not too good)
> article on the Norwegian languages.
>
> For the record, the language names used in LaTeX/Babel is
> (unfortunately) 'Norwegian' and 'norsk' for Norwegian Bokmål, and 'nynorsk'
> for Norwegian Nynorsk, instead of 'bokmal'/'bokmål' and 'nynorsk'. Norwegian
> Bokmål support was added first, and used up the 'Norwegian' name.
>
>
> Hyphenation
> -----------
>
> The two written language are quite similar, and the current hyphenation
> dictionary (nohyphbx) was made to support both. But there are (at least) two
> words which are put in the hyphenation exceptions for this dictionary because
> they would have different hyphenation (because of different meaning) in
> Norwegian Nynorsk and Norwegian Bokmål. These are:
>
> attende -- nb: at-ten-de ('eighteenth'),       nn: att-en-de ('back')
> betre   -- nb: be-tre ('enter'/'set foot on'), nn: bet-re ('better')
>
> Would it be possible to have two different hyphenation dictionaries for 'nb'
> and 'nn', which would only differ in the hyphenation exceptions used for
> these two words?

This can be done. Hans was complaining about the mess of (naming of)
Norwegian hyphenation patterns one month ago anyway, I guess that "he
won't mind" adding yet another fix to the scripts ;)

> Language setup
> --------------
>
> Here is an improved/correct version of the language setup for Norwegian. The
> setup for 'no' should be removed.
>
> \installlanguage
>   [nn]
>   [spacing=packed,
>    lefthyphenmin=2,
>    righthyphenmin=2,
>    leftsentence=---,
>    rightsentence=---,
>    leftsubsentence=---,
>    rightsubsentence=---,
>    leftquote=\upperleftsinglesixquote,
>    rightquote=\upperrightsingleninequote,
>    leftquotation=\leftguillemot,
>    rightquotation=\rightguillemot,
>    date={day,{.},\ ,month,\ ,year},
>    state=stop]
>
> This is for Norwegian Nynorsk ('nn'), but the same setup is used for Norwegian
> Bokmål (the values used for 'day' differ, though -- see below).
>
> But I am not sure I understand what the four *sentence commands are used for.
> We usually don't use em-dashes in Norwegian, so the entries look incorrect.
> If you can explain what the commands are used for, I can supply the correct
> Norwegian definitions.
>
> I also noticed that the Italian definitions use leftspeech, middlespeech and
> rightspeech commands. What are these used for?
>
>
> Other language-specific settings
> --------------------------------
>
> Norwegian (Bokmål and Nynorsk) differs typographically from English in several
> other ways. Here is three of them:
>
> We don't (usually) use bullets for the first level of unnumbered lists; we use
> en-dashes.
>
> -- Item 1
> -- Item 2
> -- Item 3
>
> Bullets are commonly seen in document created by word processors of US origin,
> and in the documents created by people without proper typographic training,
> though. It would be nice if ConTeXt could use en-dashes by default for lists
> in Norwegian text.

The default is to use
   bullet, dash, star, triangle
for the four levels if itemization.

If you want to change the behaviour in your document only, all you need to do is
    \definesymbol[1][\endash]
but I guess that it could be adapted, so that Norwegian documents will
all use endash by default.

Similar supoprt has already been implemented for Slovenian (to use
different set of characters when itemize uses characters).

There are two questions:
- do other Norwegian users agree to change the default set?
- what should be the order then? (ie: what character should be used
for the second level of itemization?)

> We don't use full stops in numbered lists. In other words, instead of
>
> 1. Item 1
> 2. Item 2
> 3. Item 3
>
> we write
>
> 1  Item 1
> 2  Item 2
> 3  Item 3

That's the matter of
\setupitemize[stopper=]

I don't know how to set that in a langage-specific way, but it sounds
reasonable me to add it.

> The same holds for numbered headings, both in the main text and in the TOC.

But sections already start with
   1 Section name
rather than
   1. Section name
by default. (Support for the second case might be improved in the
future. Or rather: I hope that it will be.)

> Would it be possible to support this by default in ConTeXt?
>
> We also use the comma in decimal numbers (3,14 instead of 3.14).

We too. In text this is no problem anyway. Math can be setup in that
way, but I doubt that it's set up in any language (although it could
be). This means that you should better write $3{,}14$ instead of
$3,14$, I don't know about any other consequences, since TeX almost
never writes out any calculated floats in the resulting document.

> Norwegian labels
> ----------------
>
> Here is labels for Norwegian (Bokmål and Nynorsk). The old 'no' labels should
> be removed. The 'nb' ones are taken from the 'no' ones, but with some
> corrections.
>
> Some comments: We don't usually capitalise the first letter in
> crossreferences. Where one would in English write
>
> See Figure 5.22 ...
>
> we would write
>
> Se figur 5.22 ... (Bokmål)
> Sjå figur 5.22 ... (Nynorsk)

But when you crossreference, you only get 5.22, you have to write
"figur" manually (you can set up that perhaps, so that you get
"figure" attached to the number, but in any case you need to do that
manually).

"Figur 5.22" will only be used under the actual image. When
crossreferencing, we use lowercase too, but under the fugure itself I
think that uppercase is OK, at least for our language (since it's
caption of the figure anyway).

> But we would of course write
>
> Figur 5.22 viser ...
> (Figure 5.22 shows ...)
>
> The definitions below use a capital first letter. Will this be a problem?
>
> I was also unsure about what the 'lines' label should be. The plural of 'line'
> ('linje') in Norwegian (both 'nb' and 'nn') is 'linjer', but we do not use
> the plural when referencing more than one line. Where one would write
>
> The discussion on lines 5--13 ...
>
> in English, we would write
>
> Drøftinga på linje 5--13 ...
>
> in Norwegian. In other words, we use the singular instead of the plural. The
> same holds for the other cross-referencing terms ('Figure', 'Table' &c.).
>
> Feel free to change the 'lines' label to 'linje' if this make it work better.

I don't know where exactly this is used, but I assume that it's for
"List of Figures", "List of Tables". But I don't know exactly, I never
use those. (I have just translated some of them and I hoped that the
first one who will consider them wrong will complain ;)

Mojca


More information about the dev-context mailing list