Proparse discussion


Some interesting timings

Getting a list of nodes with something like parseUnit:queryType("DEFINE") is very common in Prolint.

I just did some interesting calculations:

On 100,000 nodes, a run through the data blob finding all nodes of type 23 (for example) takes approx 100ms on my machine. And it is almost linear (50000 nodes = 51ms).

If I create a temp-table, it takes 1500ms to create the records (as an initial hit), BUT always less than 1ms to find all records of type 23 - almost 100 times faster

on 50,000 nodes, the timings are

datablob search: 52 ms


directories and packages

A couple of recommendations:

We should separate Julian's good stuff from my sample/test junk, and put the good stuff into the /trunk svn directory.

We should also use a better permanent package home than "proparseclient". I recommend "org.oehive.proparse".


Want to change GetNode to Node

I want to rename the ParseUnit GetNode method to simply Node, and NodeTypes GetNodeType to NodeType

 Message ParseUnit:GetNode(1):NodeText view-as alert-box.

would become

 Message ParseUnit:Node(1):NodeText view-as alert-box.

also, I would like to add Item as a synonym for Node / NodeType

 Message ParseUnit:Item(1):NodeText view-as alert-box.
 Message Proparse:NodeType:Item(1):TypeName view-as alert-box.

any objections to this ?


zero-based vs one-based

Because the number of nodes and number of sourcefiles is zero-based, in order to retrieve the correct nodes, we have to write this in client code:

DO i = 0 to ParseUnit:NumNodes - 1:
END.

could we just say that NumNodes is zero-based, and by default decrement NumNodes by -1 in the parseunit so the client code becomes

DO i = 0 to ParseUnit:NumNodes:
END.

I think that this would be less prone to mistakes:
If someone writes (both are examples of wrong code)

DO i = 1 to ParseUnit:NumNodes:
END.

Holes in Node Types

Looking through the nodetypes.bin file and the code examples, there are gaps or holes in the chain. (for example, the node types start at 4. 0,1,2,3 are missing.

I was updating the NodeTypes class to create Node type classes as and when needed (Jurjen - this will speed up creation time as well) , but having holes makes it slightly more complicated (node type 20 is not at position 20)

Would it be possible to create the .bin file with placeholders for the missing types - we could give a name of [notused] or something like that.

Otherwise I'll have to create a map in memory.


Can Proparse be a static ?

Is there any need for having more than one ProParse class in memory at once (there may be a need for several ParseUnits). If not, should we not make ProParse.cls a static class ?


Proparse API : Problem with arrays

Stealing a comment from the performance thread:

[snip] For Prolint, I know that we can come up with an API that is a layer of easy-to-use methods which save the developer from working directly with the memptr, but at the same time, is very fast because the API works directly with the memptr rather than create intermediate objects or records.
[snip]
For Prolint, I think Jurjen and I should consider a blob API that is a drop-in replacement for the old DLL API (using integer node handles). We talked about this before.

How many ParseUnits

How any Parse units are going to be active at the same time ?

When looking at potential speed enhancements, we could create a number of nodes upfront in the ProParse singleton, store the references in a static temp-table and distribute them as required to each parseunit.

This would mean the proparse itself would take several seconds to load, but the time would then not be in the parseunit itself.

I'll try to get some numbers


Static Schema

I've written a method to read in a schema and generate a schema.cls file with the defintions. Have a look at this bit of code:

using proparseclient.*.
 
DEF VAR i AS INT NO-UNDO.
DEF VAR j AS INT NO-UNDO.
DEF VAR S AS INT NO-UNDO.

s = mtime.

do I = 1 TO 100000:
 assign j = Schema:JPNode_RecordSize .
end.

message "Loop Time (ms):" mtime - s SKIP

        "Node Record Size:" Schema:JPNode_RecordSize SKIP
        "Node Type OffSet:" Schema:JPNode_type_offset SKIP
        
        "BlockNode Record Size:" Schema:BlockNode_RecordSize SKIP

What bytes where

To see exactly what is going into the blob, we can look at the Java source code:
svn://oehive.org/prorefactor/branches/proparsej/org.prorefactor.core

Classes of interest are:
com.joanju.proparse.sockets.BlobBuilder
com.joanju.DataXferStream

The first byte of a record in an Xfer blob is the encoding - a single byte to tell what kind of record it is.

What follows the encoding byte depends on the record type. For example, if we've stored an array or list, we get this:

	out.writeInt(list.size());
	for (Object o : list)
		writeRef(o);

Syndicate content