[Aleph] RfC: Boustrophedon typesetting
Giuseppe Bilotta
aleph@ntg.nl
Sun, 14 Mar 2004 15:06:17 +0100
Sunday, March 14, 2004 Daniel Richard G. wrote:
> Hello, one and all. I'm a recent comer to the Aleph project. My interest
> centers on the possibility of contributing one particular feature that is
> not [to my knowledge] available in any other existing typesetting system.
> This message is a request for comments on said potential feature:
Hi Daniel, welcome to the list. I go straight to the point
about the "implementation" (premise: I haven't given a look at
the examples yet). So I will just comment on some of the points
at the moment.
> 3. Hyphenation
>=20
> The distinct nature of boustrophedon invites a rethink of how hyphenation
> is indicated in the final typeset copy. Do we use the same convention as
> in normal typesetting, where the word is split at some acceptable point,
> the first part is appended with a hyphen character, and the second part is
> bumped down to the start of the next line?
>=20
> We could do that, but as the eye has to travel only a small amount to rea=
ch
> the remainder of the word (as opposed to going all the way back across the
> paragraph), I would like to make possible an alternative convention. See
> the second paragraph in the this image:
>=20
> http://www.iskunk.org/tmp/boustrophedon.png
>=20
> The words are joined by a bracket, with legs of unequal length. (Note that
> this is a quite a crude rendition; the bracket really should be narrower.)
> This is almost the same as the usual approach to hyphenation, except that=
a
> box/glue is now inserted _after_ the break in addition to before. (The
> bracket itself might be handled as some kind of expandable character, to
> allow variations in line spacing.)
In the book "G=F6del, Escher and Bach" by Hofstadter there are
two examples of boustrophedic typesetting. One of them is in
figure~43, chapter VI (on page 190 in the Italian edition):
it's the base sequence of the chromosome of \phi X174. Since
it's a single "word", it has this 'hyphenation' thing, which is
rendered by a small half-circular arrow connecting the end of
the previous with the beginning of the next row. (The other is
figure~42, a couple of pages before that, including an example
of a script from Rapa Nui). We could probably set something
like \right(pre|post)hyphenmark and \left(pre|post)hyphenmark
(font-specific?) to say which characters would be typeset at
the end of the previoust row, at the beginning of the next row,
for the two cases (from left to right or from right to left).
> How would this be done if the text block area is not rectilinear, e.g. if
> we're setting shaped paragraphs? I have noooo idea... :]
Nothing. Just consider the width of the marks in the
linebreaking.
> 5a. Alternate hyphenation convention
> [This may or may not require code changes---it may be implementable via t=
he
> funky "algebra" of box/penalty/glue nodes that Knuth describes in his
> paper---but for now, my working assumption is that changes will be needed.
> I do need to investigate this further.]
> According to Knuth's paper, in the line-breaking algorithm, potential
> hyphenation points in words are marked by penalty nodes, which are
> described by three values:
> =09p =3D penalty amount
> =09w =3D width of typeset material to append to the line if the line is
> =09 broken at this point (usu. the width of a hyphen)
> =09f =3D is this penalty flagged or unflagged? (boolean)
> The alternate hyphenation convention would need two widths instead of one:
> =09w0 =3D (same as w, above)
> =09w1 =3D width of typeset material to prepend to the next line, if the
> =09 line is broken at this point
> So (p,w0,w1,f) would describe the appropriately extended penalty node. w1,
> naturally, would be zero in non-boustrophedon contexts.
TeX already has some sort of idea on how to deal with this
two-sided amounts (\discretionary); in a way, we could say that
we want to turn every feasible breakpoint into a
"discretionary" type of stuff.
> 5b. Singly-/doubly-ragged justification
> I believe this is going to be one of the trickier bits. Let me render for
> you a doubly-ragged paragraph, with the text block boundaries shown. I wi=
ll
> also mark two consecutive glue nodes A and B with "----":
> =09|]] ]]]]]] ]]] ]]]]] ]]]]]]] ] ]]]] ]]]]]] ]]]]]]]]----| <- A
> =09| [[[[ [[[[[ [[ [[[[[ [[[ [[[[[[[ [[[[[[[ [[[ [[----| <- B
> =09| ]]]]]]]]] ]]]]]] ]]] ]]] ]]]] ]]]]]] ]] ]]] ]]]]] |
> =09| [[[[[[[[ [[ [[[ [[[[[[ [[[[[[[[ [[[[[[ [[[[ [[[[[[[ |
> =09| ]]]] ]]]]]] ]] ]]] ]]]]] ]]] ]]]]]]]] ]]] ]]]]]]]] |
> =09| [[[[[ [[[[[ [[[[ [[[ [[[[[ [[[ [[[[ [[[[[[[ |
> The alignment rule given earlier imposes the requirement that A and B be =
of
> equal length, along with other pairs of glue nodes surrounding a line
> break. Glue A can stretch and shrink, depending on how TeX wants to set t=
he
> line, but glue B must inflexibly match whatever width A turns out to have.
> (Alternately, you could pose the problem this way: The amount of space in=
to
> which each line can be fit depends on how far from the margin the previous
> line ends. In non-boustrophedon contexts, these widths are fixed and flui=
d,
> respectively---but here, because the two are linked, both are fluid.)
> I'm not sure how this situation should be handled/represented
> within the line-breaking algorithm. Should we have a new kind
> of glue node, that can express two spacings with a line break
> in between?
We could use a standard glue node, just keeping in mind that it
should be halved. But what I wonder is: if I'm not mistaken,
in its current implementation TeX chooses the same breakpoints
regardless of whether you're typesetting justified or ragged
text. If this is the case, it would be the same for
boustrophedic text, in which case we would have no problems
whatsoever: we break the paragraph, and then simply shift the
lines left or right of appropriate amounts when contributing to
the vertical list. We would therefore not need any special
treatement for end-of-line or beg-of-line glues. (More on this
later.) Not that tricky, IMO. (Surely not as much as the next
one ...)
> 5c. Reverse font metrics
> So we've said that the right-to-left lines will usually be rendered in a
> reflected font (e.g. xbmc10). One can expect that this reflected font will
> be exactly the same as the left-to-right font, only mirrored. (I believe
> there is a simple Metafont trick that will do this, basically scaling the
> x-axis of everything by -1.) But what if the reverse font has different
> metrics? What if, say, you want the boustrophedon text to be rendered in a
> slanted font, but you want the slant to be in the same direction for both
> LtR and RtL lines? Or you want to differentiate the RtL lines with a bold
> face?
> This issue is, I believe, confined to the very heart of the line-breaking
> code---the dynamic-programming algorithm itself. Basically, each box node
> will be able to have one of two widths, and which one is in effect depends
> on where the line breaks fall. If the two widths are unequal, the algorit=
hm
> will have to run through more possibilities to find optimal breaks, and so
> run less efficiently. I really need to understand the algorithm better to
> be able to say what changes would be needed, but I am fairly confident th=
at
> this will not require changes elsewhere.
> (Btw: This would yield a feature that can be used in non-boustrophedon
> contexts: typesetting a paragraph with alternating lines in different
> fonts. Heck, we could make it so the user can specify a whole sequence of
> fonts, one for each successive line of a paragraph... }:]
Do you have some kind of estimate on the computational
complexity of the matter?
--=20
Giuseppe "Oblomov" Bilotta