[Dev-luatex] UTF-16 in \pdfoutline

Hans Hagen pragma at wxs.nl
Tue Dec 11 09:34:37 CET 2007


Jonathan Sauer wrote:
> % The following code converts a string to UTF-16 big endian with BOM
> % and outputs it using \message:
> 
> % We change the catcode of '%' so we can use it for modulo calculations:
> \begingroup
> \catcode`\%=12
> \directlua0{\unexpanded{
> 	function convertToUTF16(str)
> 		local result = string.char(0xFE) .. string.char(0xFF)
> 		for c in string.utfvalues(str) do
> 			if c < 0x10000 then
> 				result = result ..
> 						 string.char(c / 256) ..
> 						 string.char(c % 256)
> 			else
> 				c = c - 0x10000
> 				local c1 = c / 1024 + 0xD800
> 				local c2 = c % 1024 + 0xDC00
> 				result = result ..
> 						 string.char(c1 / 256) ..
> 						 string.char(c1 % 256) ..
> 						 string.char(c2 / 256) ..
> 						 string.char(c2 % 256)
> 			end
> 		end
> 		tex.print('\\message{' .. result .. '}')
> 	end
> 	
> 	
> 	convertToUTF16('AäöüB!')
> }}
> \endgroup
> 
> 
> 
> \bye
> 
> This fails with 'Text line contains an invalid utf-8 sequence.' (not
> surprising, since the text is UTF-16 big endian). If I want to pass the
> UTF-16-encoded string i.e. to \pdfoutline (since PDF bookmarks can be
> encoded in UTF-16), how do I do this?
> 
> (Maybe a callback would be useful, i.e. `convert_pdf_text')

you can move all 'bytes' to a reserved private area (see manual) and 
then luatex will write them with that offset subtracted; think of a 
private 256 slot area representing bytes

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------


More information about the dev-luatex mailing list