[NTG-pdftex] [ pdftex-Feature Requests-429 ] Generate Tagged PDF

noreply at sarovar.org noreply at sarovar.org
Tue May 16 13:49:54 CEST 2006


Feature Requests item #429, was opened at 2005-09-12 04:07
You can respond by visiting: 
http://sarovar.org/tracker/?func=detail&atid=496&aid=429&group_id=106

Category: None
Group: None
Status: Open
Resolution: None
Priority: 4
Submitted By: Timothy O'Brien (oberon101)
Assigned to: Martin Schröder (oneiros)
Summary: Generate Tagged PDF

Initial Comment:
Adobe Reader has a'reflow' feature that allows 
visually impaired users to zoom into properly 
formatted documents and read them without having to 
move the viewable area back and forth across the 
page.  PDFs made in MikteX eith pdfte are not properly 
formatted and reflow without interword spacing, 
redering them unreadable. I contacted the MikteX 
people and they referred me here. I beleive Scientific 
Word also uses pdftex with the same result.

Any chance this could be fixed?

----------------------------------------------------------------------

Comment By: Nobody (None)
Date: 2006-05-16 17:19

Message:
Logged In: NO 

"
The main problem with reflowing is not the missing tags but
that pdftex writes interword spaces as a kern (since there
is no "space" in TeX, of course).
"
No, the main problem is that Adobe Reader dropped
recognition of "words" based on *spacing* and resorted to
the the simplistic approach to use *spaces* instead. This is
simply ignorance of Adobe about TeX.
Instead of whining about this one can do something for a
practical cure: add dummy spaces at the end of every word of
zero width which consist of a *space* and a kern
-\wd(space). space could come from any font producing
explicit spaces in the output (e.g. cmtt).
It is most easily done with VF's.

As far as I remember a real implementation is already there
in Vtex (with some option).

"
non-trivial:
- first pdfTeX would have to be extended with primites for a
structure tree (and classes and packages would have to use
these primitives)
- then primitives for tagging the content are needed and
must be used
"
i)
when generating PS-code and distilling via Adobe Distiller
there should be no problem to take care of pdfmarks created
by classes & packages...

ii) I consider the major problem is page building and page
dependency of marked contents. It is just like the
difficulty to get consistent color in TeX: one can consider
color as sort of a "tag". 

E.g. a marked TeX paragraph is broken accross pages with
(tagged) headers and the structure should in general kept
linked also under the condition of reordered pages.

I suggest first to define what functionality is required at
the side of the PDF-Reader (text extraction, save as XML or
audio output via screen reader).
The packages should then be able to save Latex macro
structure as pdfmarks for pdf tags as a prerequisite.

Support of pagebreaking of tagged objects will require some
assistance from pdftex like linebreaking of weblinks.

HS



----------------------------------------------------------------------

Comment By: Robert (schlcht)
Date: 2006-05-06 20:50

Message:
Logged In: YES 
user_id=2217

The main problem with reflowing is not the missing tags but
that pdftex writes interword spaces as a kern (since there
is no "space" in TeX, of course). A rather simple but
effective way would be to write the interword spaces in a
different font (e.g. non-embedded Times-Roman), and then
compensate for the difference between Times's width of space
and the width of the glue calculated by TeX. (At least, this
is what Distiller does, if you select "Advanced ->
Accessibility -> Add Tags to Document".)

 So that

 (This)-419(is)-420(an)-419(example)

 will be turned into:

 /T1_0 1 Tf
 (This)Tj
 /T1_1 1 Tf
 ( )Tj
 /T1_0 1 Tf
 2.369 0 Td
 (is)Tj
 /T1_1 1 Tf
 ( )Tj
 /T1_0 1 Tf
 1.092 0 Td
 (an)Tj
 /T1_1 1 Tf
 ( )Tj
 /T1_0 1 Tf
 1.475 0 Td
 (example)Tj

where T1_0 is cmr10 and T1_1 is Times-Roman.

This would be already a major enhancement with respect to
accessibility without any packages being required.


----------------------------------------------------------------------

Comment By: Nobody (None)
Date: 2006-04-14 17:03

Message:
Logged In: NO 

Maybe a first version can use a very low-level solution,
just with a single tagging primitive; there is such a command
already (I think it is called pdfliteral) . And the tree can come
later. So one could start with little work, assuming one knows
tagging.

CS

----------------------------------------------------------------------

Comment By: Martin Schröder (oneiros)
Date: 2006-04-14 16:42

Message:
Logged In: YES 
user_id=421

I'm changing the summary. Yes, we are aware that tagged pdf
is an often requested feature, but implementing it would be
non-trivial:
- first pdfTeX would have to be extended with primites for a
structure tree (and classes and packages would have to use
these primitives)
- then primitives for tagging the content are needed and
must be used

----------------------------------------------------------------------

Comment By: Nobody (None)
Date: 2005-11-05 01:28

Message:
Logged In: NO 


I would volonteer to test the feature. I am writing
a rather long pdf produced with pdftex that is 
downloadable for free ( http://www.motionmountain.net )
and readers regularly ask why it cannot be read aloud.
Pdftex probably would only need to be extended
with a single command - something like 
\writetaghere{tagtype} - and all the rest could 
be done by extensions to the latex cls and sty files.

CS

----------------------------------------------------------------------

You can respond by visiting: 
http://sarovar.org/tracker/?func=detail&atid=496&aid=429&group_id=106


More information about the ntg-pdftex mailing list