proparse api performance so far

Fri, 2008-08-08 21:45 — jurjen

I just ran the latest version of the new proparse api from subversion. I was a bit confused by the performance, I wonder if your timings are anywhere similar to mine.

This piece in ClassClient.p :

  DO lv_i = 0 TO lv_numnodes:
    ParseUnit:getnode(lv_i).
  END.

takes 21 seconds. That's quite a bit slower than expected :-(

When I empty all the codelines from the constructor in Nodes.cls, it still takes 21 seconds.
When I remove the statement in ParseUnit.cls where it says:

  NewNode = NEW proparseclient.Node(THIS-OBJECT,BUFFER TTNode:HANDLE).

Then it takes only 1.5 seconds.

Given the earlier discussion where we tried to split milliseconds, I just wonder if something is wrong with my computer. How much time does the loop take on your system?

Proparse discussion

Tue, 2008-08-12 06:55 — jmls

do you have 10.1C SP 1 installed

http://progress.atgnow.com/esprogress/Group.jsp?bgroup=progress&id=P1341...

Seems to fix some performance issues

Mon, 2008-08-11 17:17 — jmls

Setting -Bt to 2048 had a

Setting -Bt to 2048 had a big impact as well. Taken the time from 515ms to 445ms (20% improvement)

Mon, 2008-08-11 21:53 — jurjen

You did not say which test takes 445ms?
After your changes, and recompiling everything, the first message in ClassClient.p ("Time Taken [before LoadFiles]") is 8 seconds on my laptop. It was 10 seconds yesterday, so the various improvements have helped indeed.

Tue, 2008-08-12 06:41 — jmls

Sorry, it was the "Time

Sorry, it was the "Time Taken [initialize All Nodes]: 446 Milliseconds". This was for blob2

Mon, 2008-08-11 22:21 — jurjen

prolib: halfs the time

As expected, the time to search for *.r takes a long time.
When I compile everything and then prolib add all *.r to a pl (proparseclient.pl) and then change the Propath so this proparseclient.pl is the first entry, then the 8 seconds before LoadFiles goes down to 4 seconds.

But when proparseclient.pl is the last entry in the Propath, after the default appbuilder pl's, then it takes 14 seconds.

Tue, 2008-08-12 06:43 — jmls

Wow. That's a big

Wow. That's a big difference, one we should remember for "production use".

Are you happy now to close this topic ? I think we've probably managed to wring out as much performance as possible ;)

Tue, 2008-08-12 21:01 — jurjen

re: happy now?

You asked if I am happy now.... well yes and no. You have certainly won a lot of performance back.

Prolint is a tool that has to run FAST otherwise programmers are not going to use it. Similar to intellisense in the editor: if it slows you down you turn it off. If the new Proparse is considerably slower than the old one, we may have a problem. I am not being pessimistic, but not very optimistic either. It is just too soon to tell.

Eight seconds is extremely bad. Nothing else takes 8 seconds, not even the startup of Eclipse. If this 8 second pause only happens once in a Progress session, and if the consequent Prolint runs are faster than the Prolint runs with the current proparse DLL, then the net result may be good.

We have not seen the whole picture yet. There are positive and negative factors:
- the Java parser is a bit slower than the C++ parser
- the Java server has to be started, we have not timed that yet
+ we dont have to call DLL functions (thats good, because external calls have overhead)
+ the new parser re-reads blobs from previous parsings if the code has same timestamps
+ the parser runs in its own process, so it can work ahead: parse file N+1 while Prolint is still analyzing file N

Tue, 2008-08-12 21:56 — john

re: happy now?

I don't think there are any surprises here. We've always known that creating many objects or records was too slow for Prolint.

For the kind of project that Julian and I will be working on, the OO API will be ideal. We need something that is easy to script with, and speed is of little concern.

For Prolint, I know that we can come up with an API that is a layer of easy-to-use methods which save the developer from working directly with the memptr, but at the same time, is very fast because the API works directly with the memptr rather than create intermediate objects or records.

I don't think we should worry much more about the performance of the OO API. I think Julian has made it plenty fast enough that it will be pleasant to work with.

For Prolint, I think Jurjen and I should consider a blob API that is a drop-in replacement for the old DLL API (using integer node handles). We talked about this before.

This way, we'll have an API used by Prolint that is compatible with existing Prolint code, and also very fast.

All this drop-in replacement API has to do is provide integer handles, which it would associate with offsets. Remember that the offset associated with a given handle myHandle would change with a call like:

parserNodeFirstChild(myHandle, myHandle).
display parserGetNodeText(myHandle).

All the API has to do is get the offset associated with the handle and then fetch the requested data from the blob - whether it's the offset of the firstChild, or the node's text.

The map from handleNum to nodeOffset would not be a temp or work table, since those have some overhead. It would just be an array:

offset = nodeHandles[handleNum].

In the C++ API, I also kept any array of 'unused' handles, so that slots in the array could be re-used when parserGetHandle() was called.

Tue, 2008-08-12 21:30 — jmls

Whilst I agree 8 seconds is

Whilst I agree 8 seconds is too much, there are a couple of things:

1) Using the prolib you have managed to cut that down to 4 seconds
2) On my machine the "before load files" takes 1s
3) I have not used prolib, so I would expect to gain another .25 or even .5 seconds from that, taking it to 500ms
4) It is the initial load that takes the time - and that is once during the session (or parseunit) subsequent times are *much* faster.
5) perhaps John could create a service for the java parser, so it is always running
6) Heh. on my machine OEA takes 7.5 seconds to start :)

Perhaps what we could do is get several people to download and test on various machines so that we have some meaningful benchmarks instead of the "mine is better than yours" :)

I will try it on my laptop and other desktop tomorrow.

Thanks, by the way, for the constructive criticism and insights provided during this debate.

Mon, 2008-08-11 22:35 — tamhas

Sounds a lot like the

Sounds a lot like the difference with your machine is the disk performance.

Mon, 2008-08-11 15:41 — jmls

a big jump in speed

checkout the latest versions of the classes - I moved the setting of nodetype into a lazy property (now only set if the property itself is requested) instead of in the setproperties method. This is now 30% faster (initial get of all nodes down from 850ms to 630ms)

Mon, 2008-08-11 16:40 — jmls

moved all properties to lazy

moved all properties to lazy mode. Now down to 575ms from 630

Mon, 2008-08-11 12:30 — jmls

more performance improvements

I've just uploaded some changes which removes the need to maintain the initialized flag on the temp-table. This improves the load speed by 200ms+ - perhaps more on Jurjen's laptop ;) by using a pointer to a byte in memory

Sat, 2008-08-09 19:41 — jmls

Node creation speed

I've done some checks, and when loading the 10000 nodes from the example, it takes 1751ms to create the nodes (without setting any properties) and 2030ms to create the nodes with all the properties. So, it takes 300ms to set the properties on 10000 nodes.

Sat, 2008-08-09 17:07 — john

OE version?

Jurjen, what version exactly of OE are you using? On which OS? Machine memory, CPU speed, dual-core?
(I'm using 10.1C on Vista 32-bit, Centrino dual core 2.2 GHz with 3G RAM.)

Sat, 2008-08-09 17:30 — jmls

OE 10.1C01, Windows 2008

OE 10.1C01,
Windows 2008 server (tweaked to look like a vista workstation)
4GB ram
proedure editor from OEA
quad-core 2.5

Sat, 2008-08-09 10:24 — jmls

how long does it take to

how long does it take to create 10000 empty classes on your machine:

CLASS A:
END CLASS.

def var i as int no-undo.
def var a as class proparseclient.a no-undo.

def var st as int no-undo.
st = mtime.

do i = 1 to 10000:
 a = new proparseclient.a().
end.

message mtime - st view-as alert-box information.

running 5 times, on my machine this takes 750ms,751ms,748ms,750ms, 749ms

Sat, 2008-08-09 10:36 — jurjen

too long

good experiment! It takes 15078, 14797, 14766, 14734, 15453 ms.

Sun, 2008-08-10 20:57 — jurjen

other pc: still too long

On the laptop owned by the company, which I use for programming: 3375, 3968, 3938, 3938, 3938 ms.

The ParseClient.p takes 10 seconds (on this laptop) before the first LoadFiles() call. On the other PC this takes 30 seconds.

Sun, 2008-08-10 21:08 — jmls

the interesting thing is

the interesting thing is that it shows your laptop is still 4x slower than my desktop at creating empty classes (nothing to do with proparse, temp-tables or lazy nodes). It simply takes more time. I don't quite know what to suggest - John and I have similar times for class creation.

Have you tried the latest svn version of the classes ? I create nodes in a pool now and reuse them when needed, saving the overhead of creating them (apart from the very first initialization time)

Sun, 2008-08-10 21:25 — jurjen

The suggestion would be: ask

The suggestion would be: ask your boss for a better computer for Proparse. I seriously doubt if he responds positively.
This laptop 6 month old, CPU is a T5500 @ 1.66 GHz with 2 GB of RAM, running Windows XP Pro sp2, OE10.1C.
The PC is a few years old already: Pentium 4 CPU @ 2.8 GHz with 1.5 GB of RAM.
Both computers are adequate for normal design, compile and run tasks.

Yes, tried the latest versions of the classes. Building the pool is probably what costs 10 and 30 seconds, respectively.

Mon, 2008-08-11 08:33 — jmls

Perhaps we need to rethink

Perhaps we need to rethink the whole node-class thing then. If the performance for creating the empty classes is not acceptable, then it won't be acceptable when we start to add properties etc either.

The way I see it going forward is that we ditch nodes, node type classes and the temp-tables and move everything into just reading the blob. This will remove some of the nicer functionality (like being able to read through the TT for matches etc) but will be the only way to improve the performance to the levels that you need.

I am struggling to see, however, why your 2.8 P4 is *so* much slower (like 30x) than my 2.5 core2.

Mon, 2008-08-11 10:34 — jurjen

Optimized the laptop a bit,

Optimized the laptop a bit, it is now: 1060, 1080 ms.

The differences are:
- class a.cls is now found in the first propath entry instead of the second propath entry.
- class a.cls is compiled to a.r
- added the -q startup parameter so Progress does not evaluate 10000 times if the .r is outdated.
each of these three difs contributed a similar amount of speed.

Replacing a.cls by an empty b.p, and running that b.p persistent instead of new'ing an empty class, gives the same results.
So I guess that searching the harddisk is slow compared to your machines. If that is true then perhaps packing the Proparse API classes in a PL helps a lot, I have not tested that yet.

Mon, 2008-08-11 11:11 — jmls

That's great news

That's great news - so can we keep the classes, pretty please ! It would be useful if you kept that information in the README or somewhere so someone else can optimize as well. I have to say that I assumed that you had already compiled the classes. Have you compiled the proparse classes as well now ? What effect on the timings did this have?

Mon, 2008-08-11 11:53 — jurjen

Your assumption was right

Your assumption was right from the beginning: I did have the classes compiled but that, by itself, did not make a noticable difference. The r-code began to make a difference after I changed the propath from "proparseclient,." to ".,proparseclient".
By the way, "proparseclient" should not have to be in the propath at all, but that's a different issue.

Sat, 2008-08-09 10:55 — jmls

yikes! this comment was from

yikes! this comment was from another thread by John:

"think it took about 4 seconds on my machine to create 10,000 objects, where each object had several int values assigned from random() to simulate at least a little bit of attribute loading"

so it's not unreasonable to assume that he would be in the 1-2 second range for a null class. That's still 8-10x faster than your machine though.

Sat, 2008-08-09 07:18 — jmls

that piece takes 2065ms

that piece takes 2065ms (i.e. 2 seconds)

commenting out the code gives me 300ms

It's the actual creation of the node class that takes time. Obviously this loop is sub-optmal as it loads all the nodes at once. What would normally happen is that you would request a node one at a time, as in:

Node = ParseUnit:GetNode(1).
message node:firstChild:NodeText view-as alert-box.

This would load node 1 and the first child of node 1 into memory.

I don't know why your machine is 10x slower though.

Sat, 2008-08-09 08:49 — jurjen

re: that piece takes 2065 ms

10 times slower ?!?! Weird! I am using the default out-of-the-box session parameters for OpenEdge.

I realize this loop creates all nodes at once, that why it is interesting. Because I think that a lot of nodes will be requested and created after 70 Prolint rules have run. Maybe not 100% of all nodes, but a fair deal anyway.

Sat, 2008-08-09 09:50 — jmls

my defaults are -s 512 -inp

my defaults are -s 512 -inp 32000 -tok 4000 -s 200 -Mm 8192

I was quite pleased with the performance ;) I'll take a look later today to see if there is anything that I could do to tighten the performance. Using preprocessors for the field offset will help, as I could pass the memptr directly to the node and it could extract the data from memory.

Sat, 2008-08-09 10:07 — jurjen

The OpenEdge Hive

More Navigation