Performance Testing

Sat, 2010-06-26 21:02 — tamhas

Most relationship collections will contain only a handful of elements, but it is possible for a relationship to contain a large number of elements. Therefore, we are concerned with both the memory footprint of a class when it contains a small number of elements and with the performance of that class when it contains a large number of elements. In particular, the 2006 implementation used temp-tables throughout. This meant that they could be scaled to arbitrarily large numbers of elements and still perform adequately, but it also meant that a rather large amount of memory was consumed for a small collection.

Therefore, in developing these relationship collections classes we are looking at minimal memory footprint, scaling of memory footprint, and performance in a number of tasks. The tasks we have identified are:

Add N objects to collection;
Clear collection of N objects;
Delete M random objects from a collection of N;
Find M random objects from a collection of N by position (applicable only to List?*);
Find M random objects from a collection of N by identity;
Find M random objects from a collection of N by key;
Iterate through M instances in a collection of N starting with first;
Iterate through M instances in a collection of N starting with last;
Delete M consecutive instances from a collection of N starting with I;
Add M consecutive instances to a collection of N starting with I (only List?*)

Tests will be based on varying values of M, N, and I to identify any possible non-linearities. Testing will begin with smaller values and will be discontinued when results are judged unacceptable. Not all tests apply to all relationship base class types, i.e., one can only test access by keys in a base type which has keys.

It should be recognized that some of these tests will represent extreme stress tests compared to what one expects in practice in an application. For example, deleting 1000 random objects from a set of 10,000 would be quite extraordinary, both because few collections are likely to have even as many as 100 elements and because normal processing will be to fill a collection, process all elements one or more times, and then to clear and delete the collection as a whole. I.e., one typically will never delete even one element until such point as one is clearing the whole collection, which may be a far more efficient operation than deleting elements one by one. Nevertheless, relationship navigation is extremely common in OO programming, so identifying possible inefficiency is highly desirable.

* Note: List is a Java collection class type in which elements can be addressed by position, i.e., Nth. Currently, this does not seem to be a requirement for relationship collections and so is not included in the initial implementation, but it does seem like a useful class in other contexts and so is likely to appear in a later round of development.

Printer-friendly version
Login to post comments

Comments

Thu, 2010-07-01 12:40 — Guillaume Morin

It may be important to note

It may be important to note that if someone decide to use these set classes, it may ba a good idea to extend every classes, to isolate custom code and shield from library update breaking changes. Partial classes would be useful to isolate library code from custom code, but again not supported.

Since adding an inheritance level adds perceptible overhead in ABL, I think it should be included in these tests.

Mon, 2010-07-05 19:51 — tamhas

Are you saying that adding

Are you saying that adding inheritance reduces performance, even though it is a relationship resolved at compile time?

Tue, 2010-07-06 11:46 — Guillaume Morin

Yes. Attached is some code

Yes.
Attached is some code to illustrate this.

Attachment	Size
test.zip	4.59 KB

Tue, 2010-07-06 18:50 — tamhas

Got it

Turns out I already had that code from the earlier batch of tests you sent me, but it is one that I never got to looking at.

Written as per your original, I got 30.081s for the no inheritance version and 66.324s for the inherited version.

Speculating that the difference was the chained supers, I copied yours and did a version in which I changed the private data members to public properties, eliminated the interfaces, eliminated any logic in the constructor, and did the property assignments in run.p. That gave me 14.789s with no inheritance and 47.610s with inheritance.

This makes no sense to me, so I'm headed for PSDN.

Thu, 2010-07-01 12:33 — Guillaume Morin

I'm interested to see if the

I'm interested to see if the WTIDSet, with its work-table + array overhead, will still be faster than a TT implenmentation.

Thu, 2010-07-01 16:51 — tamhas

I'll be interested too.

I'll be interested too. Remember, though, that it isn't just speed, but also memory footprint. A TT is a minimum of 9K even with -tmpbsize 1. It wouldn't surprise me to find that there is a crossover. I.e., below a certain size, the performance difference was nominal, but the WT was smaller whereas above a certain size, the WT starts to get bigger. Performance is another question. If one was dealing with random access, of course, I would put my bet on the TT right away, but given that normal access for most collections is to start at the beginning and go to the end, the the WT may do OK. Of course, this could be different if we do go with external iterators since we would have to re-establish the cursor position every time.

I do think that one of the keys is minimizing the number of small ABL instructions. I.e., a DO loop reading records and counting is bound to be a lot slower than a FIND on a WT, even thought the FIND has to actually read all the same records, simply because there is only one ABL instruction for the AVM to process and then all the looping happens at C speeds.

The OpenEdge Hive

More Navigation

OERA Open Source Initiative