New proparse.jar API

Proparse is ported to Java and can be launched as a little server which listens on a TCP port. It receives a program name, finds and parses the program's source code, and then returns a BLOB containing records of all the objects in the syntax tree and symbol tables.

The blobs will be read by ABL routines to generate ABL objects (temp-tables, data sets, Objects) as needed.

This new Proparse API will be used by another project right away. In the coming days you can look for “GUI BOM” here on the hive.

There isn't any timetable for a new version of Prolint using this new API.

API Goals

The API will be a facade for the byte data in the blob, and the API should generate ABL objects only on demand. For example, consider node:firstChild(). This should return a Node object. The internals of the firstChild() function will check to see if the object has been created yet, create it if necessary, and then return it.

As another example, node:getText() (or just node:text using PROPERTIES) won't actually return data from the Node object itself. The method will look up the necessary records in the blob to return the node's text.

Another example: Functions which query the syntax tree (for example: find all DEFINE nodes) should not operate by recursive-descent of all the objects in the tree, because that would force the creation of a Node object for every node in the tree. Instead, such query functions should work by examining the bytes in the blob directly, and creating those Node objects as needed for the result set. (The API will have to keep track of which Node objects have already been created.)

OpenEdge Version

It is probably reasonable to point out that you can use older versions of Prolint with older versions of OpenEdge. We will write some of the new API using ABL classes, but it wouldn't be a big task if anyone wanted to back-port the API so it works with earlier releases. We'll keep in mind that, when there's no significant disadvantage, we'll use a plain old .p so that back-porting would be a little bit less work.


New blob api early example

I created a new repository called 'codeparse', added a directory called 'wip' for shared efforts that are work-in-process or experimental, and added a subdirectory called 'proparseclient'.

svn://oehive.org/codeparse/wip/proparseclient

For those interested in building and using the API, you should be able to run client.p in that directory, which reads the two binary files 'header.bin' and 'blob.bin'. The program 'tree2text.p' is small and reasonably easy to understand as a starting point. If you don't know which API we're talking about, see http://www.oehive.org/node/1247

(SVN is running slow for me today, which is unusual. I hope it's just my local ISP.)

As my next task, I think I'll add a data transfer from Proparse so that we can fetch its internal data about node types and keywords.


New proparse.jar BLOB Internals

Most developers working with Proparse would use an existing API, and these notes would not be of interest. These notes are only for developers working directly with the bytes from the blob, for example, when developing a new API.

The layout for the blob had one over-riding design consideration: It had to be indexed for fast random access of the records and fields within it. Loading all of the records into ABL objects would take too long for an interactive tool like Prolint.

As a result, the blob ended up looking somewhat like a miniature read-only database.

In order to understand what is inside the blob, it is probably easiest to start by understanding the goals.

Fixed Length Records (mostly)

One goal was to be able to access record fields (examples: node.text, node.type, node.line) by their offset in the record. As a result, most types of records are fixed length. The exceptions are strings, lists, and maps (i.e.: 'collections'). Collection records all have a similar layout: the collection's size, followed by the data.

So how does a 'node' record, for example, have a fixed length if it contains variable length data like strings? A string field in the record doesn't contain the string. It only contains a reference to a string record. The same is true for references to lists, maps, and any other record. The only fields that actually have their data there at the field position in the record are boolean and integer fields.

The Index

Since records (other than collections) are fixed-length, we couldn't use the usual serialization technique of writing one record right at a field position within a referencing record. But since we don't know the byte offset of each record within the blob until the record is written, how do we reference the record? Each record is referenced with an integer index number.

The index is an array of integer offsets. So, if we know that we want the record with index number 2, we get the offset by looking at position number two in the index.

The index is written at the end of the blob, after all records have been written, and their index numbers (and offsets) have been tallied.

The Record Schema

A special problem was presented by the fact that the Java code inside Proparse might change. We might want to rename fields. Fields might be inserted into the middle of field lists – especially troublesome in cases where fields get added to super classes.

To avoid problems with fields names and positions changing over time, each blob will contain a 'schema' of the records inside it. The schema is safe to re-use from one blob to the next, as long as the same version of proparse stays running as the server. (I.e. restart your clients if you upgrade the server.)

The schema for a class is very simple: an index to the name of the class in Proparse (ex: “org.prorefactor.core.JPNode”), followed by a pair of numbers for each field. One number is the field's record type, and the other number is the index of the string record of the field's name.


Using proparse.jar as a Server

Proparse.jar may be launched as a server which listens on a TCP socket. For example:

rem proparse.bat
set JAVA_PATH= proparse.jar
set JAVA_OPTS= -cp %JAVA_PATH% -Xss2M
java %JAVA_OPTS% proparse.Server

As of early August 2008, the New API for using this server is still work-in-process.

Optional
There are a few server server options which can be specified in an optional file named 'proparseserver.properties' in the server's working directory:

# Configuration file for running proparse.jar as a server.
# This file must be in your current working directory.
# (i.e.:  ./proparseserver.properties)
# Remove the leading hash mark '#' to uncomment a setting
# for property = value.


# port (optional)
# The port to listen on. If no port is specified, then 55001 is used.
# port = 55001


# project (optional)
# The project settings directory to load from ./prorefactor/projects/.
# If no project is specified, then Proparse will simply load
# from the first directory it finds.
# There does not have to be a project configured before launching the
# server, because the client can take care of that.
# project = sports2000