[NTG-context] Microsoft Word -> Context

Karsten Heymann karsten.heymann at googlemail.com
Mon Apr 2 21:57:56 CEST 2007


Hello Vyatcheslav,

2007/4/2, Vyatcheslav Yatskovsky <yatskovsky at gmail.com>:
> Then, we need something like Word2ConText (or a macro written in VBA) to convert
> incoming papers to ConText code and then easily assemble them. Something, that
> resembles famous Word2Tex application.

I've recently created such a solution for a journal, hand-crafted to a
very specific document template. They now have to pre-format every
article with this template, export it to HTML and
my converter makes Context of it. Be awary, that this required a
significiant amount of time
(and money, as it was contract work). But the basic idea is quite simple:

* preformat the doc in word by applying special paragraph styles to
all paragraphs (which
  will be mapped nicely to CSS classes)
* Export the word doc to HTML
* make XML from it with htmltidy
* filter out those huge amounts of unneeded stuff (CSS-Stuff, DIVs and the like)
* go through the list of paragraphs, and for each paragraph type know what to do

I've implemented it in Python (using DOM and SAX, now that I know
more, I would start with ElementTree from the beginning).
Unfortunately, as it was contract work, I cannot give out the code,
but if specific questions arise, I will gladly share my experiences.

Yours
Karsten


More information about the ntg-context mailing list