[NTLK] Converting NewtonBooks to ePub
Matthias Melcher
m.melcher at robowerk.de
Fri Feb 27 17:03:57 PST 2026
Ultra geek alert. ;-)
So I ran through all 600 books on Unna and analyzed their format. I think I have found pretty much all nuances, and only very few details remain to be explored.
The format has a few lists that describe contents, attributes, and page layout.
Contents is a list of content blocks. They are quite flexible and can not only contain international western formatted text with font styles and sizes, but also images in PICT or Newton format, and even user interface elements like buttons and menus.
The second big list describes pages, each of which has boxes, referencing content blocks as described above.
The rest is lists of formats, page layouts, the table of contents, and a collection of "hints" for faster text search.
See https://github.com/MatthiasWM/newton-framework/blob/4672b534d58c0643c084524a1b3f23ee56e57a91/Matt/BookWriter.cc#L39 for very messy source code with comments for most frame tags.
So what this means, I can easily dump all the text in those files as an unformatted unicode text file.
To format things, HTML would be an easy format. A lot of page information would be lost, but font style and sizes would remain. It could support the table of contents, and with some extra work even integrate images.
ePub OTOH is a rather complex format. In some ways it is similar to HTML (it can even contain HTML files), but it support pixels exact positioning as an alternative, which is important for comic books. Writing this format would take a while until I can get it good enough.
As a last resort, one could always do screen dumps manually from Einstein, but that is a lot of work and gives only a bunch of low res images as a result.
Again, as with so many Newton experiences, this is a very well though out format with incredible flexibility. A book can, for example, contain a functioning specialized calculator.
- Matthias
More information about the NewtonTalk
mailing list