Proparse Group



Some interesting timings

Getting a list of nodes with something like parseUnit:queryType("DEFINE") is very common in Prolint.

I just did some interesting calculations:

On 100,000 nodes, a run through the data blob finding all nodes of type 23 (for example) takes approx 100ms on my machine. And it is almost linear (50000 nodes = 51ms).

If I create a temp-table, it takes 1500ms to create the records (as an initial hit), BUT always less than 1ms to find all records of type 23 - almost 100 times faster

on 50,000 nodes, the timings are

datablob search: 52 ms


directories and packages

A couple of recommendations:

We should separate Julian's good stuff from my sample/test junk, and put the good stuff into the /trunk svn directory.

We should also use a better permanent package home than "proparseclient". I recommend "org.oehive.proparse".


Want to change GetNode to Node

I want to rename the ParseUnit GetNode method to simply Node, and NodeTypes GetNodeType to NodeType

 Message ParseUnit:GetNode(1):NodeText view-as alert-box.

would become

 Message ParseUnit:Node(1):NodeText view-as alert-box.

also, I would like to add Item as a synonym for Node / NodeType

 Message ParseUnit:Item(1):NodeText view-as alert-box.
 Message Proparse:NodeType:Item(1):TypeName view-as alert-box.

any objections to this ?


zero-based vs one-based

Because the number of nodes and number of sourcefiles is zero-based, in order to retrieve the correct nodes, we have to write this in client code:

DO i = 0 to ParseUnit:NumNodes - 1:
END.

could we just say that NumNodes is zero-based, and by default decrement NumNodes by -1 in the parseunit so the client code becomes

DO i = 0 to ParseUnit:NumNodes:
END.

I think that this would be less prone to mistakes:
If someone writes (both are examples of wrong code)

DO i = 1 to ParseUnit:NumNodes:
END.

Holes in Node Types

Looking through the nodetypes.bin file and the code examples, there are gaps or holes in the chain. (for example, the node types start at 4. 0,1,2,3 are missing.

I was updating the NodeTypes class to create Node type classes as and when needed (Jurjen - this will speed up creation time as well) , but having holes makes it slightly more complicated (node type 20 is not at position 20)

Would it be possible to create the .bin file with placeholders for the missing types - we could give a name of [notused] or something like that.

Otherwise I'll have to create a map in memory.


Can Proparse be a static ?

Is there any need for having more than one ProParse class in memory at once (there may be a need for several ParseUnits). If not, should we not make ProParse.cls a static class ?


Proparse API : Problem with arrays

Stealing a comment from the performance thread:

[snip] For Prolint, I know that we can come up with an API that is a layer of easy-to-use methods which save the developer from working directly with the memptr, but at the same time, is very fast because the API works directly with the memptr rather than create intermediate objects or records.
[snip]
For Prolint, I think Jurjen and I should consider a blob API that is a drop-in replacement for the old DLL API (using integer node handles). We talked about this before.

How many ParseUnits

How any Parse units are going to be active at the same time ?

When looking at potential speed enhancements, we could create a number of nodes upfront in the ProParse singleton, store the references in a static temp-table and distribute them as required to each parseunit.

This would mean the proparse itself would take several seconds to load, but the time would then not be in the parseunit itself.

I'll try to get some numbers


Static Schema

I've written a method to read in a schema and generate a schema.cls file with the defintions. Have a look at this bit of code:

using proparseclient.*.
 
DEF VAR i AS INT NO-UNDO.
DEF VAR j AS INT NO-UNDO.
DEF VAR S AS INT NO-UNDO.

s = mtime.

do I = 1 TO 100000:
 assign j = Schema:JPNode_RecordSize .
end.

message "Loop Time (ms):" mtime - s SKIP

        "Node Record Size:" Schema:JPNode_RecordSize SKIP
        "Node Type OffSet:" Schema:JPNode_type_offset SKIP
        
        "BlockNode Record Size:" Schema:BlockNode_RecordSize SKIP

What bytes where

To see exactly what is going into the blob, we can look at the Java source code:
svn://oehive.org/prorefactor/branches/proparsej/org.prorefactor.core

Classes of interest are:
com.joanju.proparse.sockets.BlobBuilder
com.joanju.DataXferStream

The first byte of a record in an Xfer blob is the encoding - a single byte to tell what kind of record it is.

What follows the encoding byte depends on the record type. For example, if we've stored an array or list, we get this:

	out.writeInt(list.size());
	for (Object o : list)
		writeRef(o);

Using proparse.jar as a Server

Proparse.jar may be launched as a server which listens on a TCP socket. For example:

rem proparse.bat
set JAVA_PATH= proparse.jar
set JAVA_OPTS= -cp %JAVA_PATH% -Xss2M
java %JAVA_OPTS% proparse.Server

As of early August 2008, the New API for using this server is still work-in-process.

Optional
There are a few server server options which can be specified in an optional file named 'proparseserver.properties' in the server's working directory:

# Configuration file for running proparse.jar as a server.

proparse api performance so far

I just ran the latest version of the new proparse api from subversion. I was a bit confused by the performance, I wonder if your timings are anywhere similar to mine.

This piece in ClassClient.p :

  DO lv_i = 0 TO lv_numnodes:
    ParseUnit:getnode(lv_i).
  END.

takes 21 seconds. That's quite a bit slower than expected :-(

When I empty all the codelines from the constructor in Nodes.cls, it still takes 21 seconds.
When I remove the statement in ParseUnit.cls where it says:

  NewNode = NEW proparseclient.Node(THIS-OBJECT,BUFFER TTNode:HANDLE).

What is the minimum version of progress

I've just caught myself writing some code and realized that this was 10.1C specific.

METHOD PUBLIC LOGICAL LoadFiles (p_Header AS CHAR,p_Data AS CHAR):
blobutilities:LoadBlobFromFile(p_Header,HeaderBlob).
blobutilities:LoadBlobFromFile(p_Data,DataBlob).

RETURN YES.

CATCH e AS Progress.Lang.Error :
SET-SIZE(HeaderBlob) = 0.
SET-SIZE(DataBlob) = 0.
DELETE OBJECT e.
RETURN NO.
END CATCH.

END METHOD.

what is the minimum requirement for the new class-based proparse API ?


Refactoring preprocessor directives

A parser accepts a stream of tokens, and then builds the tree. Proparse's token stream has some token types that get filtered and attached to nodes as hidden tokens.

com.joanju.proparse.DoParse.java:

filter.hide(NodeTypes.WS);
filter.hide(NodeTypes.COMMENT);
filter.hide(NodeTypes.AMPMESSAGE);
filter.hide(NodeTypes.AMPANALYZESUSPEND);
filter.hide(NodeTypes.AMPANALYZERESUME);
filter.hide(NodeTypes.AMPGLOBALDEFINE);
filter.hide(NodeTypes.AMPSCOPEDDEFINE);
filter.hide(NodeTypes.AMPUNDEFINE);

Lines and columns

(Answering offline questions)
Yes, the line and column fields in the node are indeed the line and column of the token from the original source file. Those are count-from-one.

There is also a source file reference number, referencing the array of source file names (count from zero). (See client.p, which writes out the array of source file names.)


#
Syndicate content