New proparse.jar BLOB Internals

Most developers working with Proparse would use an existing API, and these notes would not be of interest. These notes are only for developers working directly with the bytes from the blob, for example, when developing a new API.

The layout for the blob had one over-riding design consideration: It had to be indexed for fast random access of the records and fields within it. Loading all of the records into ABL objects would take too long for an interactive tool like Prolint.

As a result, the blob ended up looking somewhat like a miniature read-only database.

In order to understand what is inside the blob, it is probably easiest to start by understanding the goals.

Fixed Length Records (mostly)

One goal was to be able to access record fields (examples: node.text, node.type, node.line) by their offset in the record. As a result, most types of records are fixed length. The exceptions are strings, lists, and maps (i.e.: 'collections'). Collection records all have a similar layout: the collection's size, followed by the data.

So how does a 'node' record, for example, have a fixed length if it contains variable length data like strings? A string field in the record doesn't contain the string. It only contains a reference to a string record. The same is true for references to lists, maps, and any other record. The only fields that actually have their data there at the field position in the record are boolean and integer fields.

The Index

Since records (other than collections) are fixed-length, we couldn't use the usual serialization technique of writing one record right at a field position within a referencing record. But since we don't know the byte offset of each record within the blob until the record is written, how do we reference the record? Each record is referenced with an integer index number.

The index is an array of integer offsets. So, if we know that we want the record with index number 2, we get the offset by looking at position number two in the index.

The index is written at the end of the blob, after all records have been written, and their index numbers (and offsets) have been tallied.

The Record Schema

A special problem was presented by the fact that the Java code inside Proparse might change. We might want to rename fields. Fields might be inserted into the middle of field lists – especially troublesome in cases where fields get added to super classes.

To avoid problems with fields names and positions changing over time, each blob will contain a 'schema' of the records inside it. The schema is safe to re-use from one blob to the next, as long as the same version of proparse stays running as the server. (I.e. restart your clients if you upgrade the server.)

The schema for a class is very simple: an index to the name of the class in Proparse (ex: “org.prorefactor.core.JPNode”), followed by a pair of numbers for each field. One number is the field's record type, and the other number is the index of the string record of the field's name.