[NTLK] Converting NewtonBooks to ePub
J Caffiney
caffiney at gmail.com
Tue Feb 24 18:06:37 PST 2026
I looked at NSOF early on, but what's inside the .pkg book parts is
actually the NOS binary format (i.e. direct ref-encoded objects), not NSOF.
The package part data seems to have the raw frames, arrays, and binaries
with the 8-byte object headers and pointer-based refs, rather than the NSOF
tagged stream format.
I'll take a look at Mr. Köppen's RDCL -- I hadn't seen it before. But I
don't have a serial cable or ethernet card for my NMP 2100.
Thank you!
-J
On Tue, Feb 24, 2026 at 8:39 PM Victor Rehorst <victor at chuma.org> wrote:
> A complete shot in the dark here, but could it be - or are you referring
> to - Newton Streamed Object Format (NSOF)?
>
> Eckhart Köppen's RDCL builds a command-line tool `nsof` that can decode
> NSOF to XML or YAML. There are other, earlier implementations of NSOF
> as well.
>
> On 2026-02-24 19:51, J Caffiney wrote:
> > I've been looking at the NewtonBook binary format lately, trying to
> figure
> > out if there's a reasonable way to extract book content from .pkg files
> and
> > turn it into something readable on today's devices.
> >
> > I've managed to get some basic conversions working — NewtonBook .pkg to
> > ePub. Results are rough but readable. So far I've run it against a few
> > titles from UNNA: The Art of Newton, InvesTerms, and the NS BASIC Tech
> > Notes. Chapter structure and text come through. Styles are hit or miss.
> I'm
> > happy to share the converted ePubs if anyone wants to see how they turned
> > out.
> >
> > The interesting challenge is Newton's internal format — UTF-16 big endian
> > text, the ref encoding for objects, and the way book content is nested
> > inside NOS frames. Apple documented some of this in the Formats spec but
> > there are gaps, especially around how style runs map to the actual text
> > boundaries. Robert Sundling's PKGDUMP (
> > https://github.com/RobertSundling/Newton-PKGDUMP) was helpful for
> > sanity-checking the package structure, but the book-specific content
> layer
> > is mostly undocumented territory.
> >
> > Does anyone have deeper knowledge of the format — especially the
> > relationship between style runs and text content in book parts? I've been
> > filling in the gaps by trial and error, but I'd rather not reinvent
> > the wheel.
> ----------------------------------------------------------------------
>
> https://newtontalk.net
> https://bitbang.social/@newtontalk
> https://twitter.com/newtontalk
More information about the NewtonTalk
mailing list