Properties and Operators

Sat, 2010-06-26 21:07 — tamhas

Java collection classes have been provided with a fairly rich set of properties and operators, presumably based on the idea that these classes can then be used for a richer range of rôles, just as there are many classes in that same hierarchy which are clearly not intended for relationships at all. In the present implementation, the decision was made to focus on the essentials of object relationships with the idea of creating efficient, low impact infrastructure for OO programming and then to consider later whether there are other classes with other functions which could also be created. Therefore, this implementation uses minimal properties and operators.

Note that we are using a leading "a" to indicate an abstract class and a leading "i" to indicate an interface.

aSet
This is the abstract class for the most common collection type, that ordered only by the sequence in which elements are added to it. It has two read-only properties – Size (number of elements) and IsEmpty. It has and four operators:

Add – adds the object in its parameter to the collection;
Remove – removes the object in its parameter from the collection;
Clear – removes all elements from the collection;
Contains – returns a logical indicating whether the object parameter is in the collection; and
Get Iterator - returns an Iterator object on the class.
Note that Contains will be inefficient on a large aSet and aIDSet is preferred if this is a regular need.

aIDSet
This is the abstract class for the simple collection type ordered by object identity. It has all of the same properties and operators as aSet plus one additional operator Get which has an integer argument and which returns the object with that identity.

aAttrKeySet
This is the abstract class which takes key/value pairs in which the key is considered an identifier for the object, i.e., keys must be unique. It has the same two properties as aSet. Its operators are:

Add – parameters are a key/value pair which is added to the collection;
GetValue – returns the object corresponding to the key in the parameter;
RemoveKey – removes the key/value entry specified by the key in the parameter;
RemoveValue – removes the key/value entry corresponding to the object in the parameter;
Clear – removes all entries
Contains – returns a logical indicating whether the object parameter is in the collection; and
GetIterator – returns an Iterator on the keys.

aAttrSortSet
This is the abstract class which takes key/value pairs, but the key is considered only an attribute which defines sort order, not a unique key. It has the same properties and operators as aAttrKeySet except that GetValue is replaced by GetValues which return a collection of all objects in the collection with the key in the parameter.

iIterator
This is the interface which defines the signature for all collection iterators. It has two read-only properties – HasNext and HasPrev. Its operators are:

First – positions at the first object in the collection and returns it;
Next – advances to the next object in the collection and returns it;
Last – positions at the last object in the collection and returns it;
Prev – advances in reverse order to the previous object in the collection and returns it; and
Remove – removes the object at the current position from the collection.

Note that we have considered whether the key/value collection classes should have both GetAttrIterator, i.e., on the keys, and GetIDIterator, i.e., on the ID of the objects, as this would be easy to do with a temp-table implementation. Currently, we are leaning toward just GetIterator on the keys as this seems in keeping with the needs of relationships.

Printer-friendly version
Login to post comments

Comments

Wed, 2010-06-30 18:23 — ptfreed

Serialization?

Has any consideration been given to serialization and/or object permanence.

Wed, 2010-06-30 22:18 — tamhas

See the proposals here

See the proposals here http://communities.progress.com/pcom/thread/27156?tstart=0

Wed, 2010-06-30 15:06 — Guillaume Morin

Use of aIDSet:Get(int) ?

I may be missing something obvious, but in what case would the aIDSet:Get(int) method be useful ?
I mean, the consumer of the aIDSet:Get(int) method need to send an integer of an object reference, this integer value had to be retrieved from an actual object reference, why the consumer would keep this integer value instead of keeping directly the object reference ?

Wed, 2010-06-30 19:55 — tamhas

Since normal processing will

Since normal processing will be to iterate through the collection from one end to another, I expect this Get to be rarely used. But, it does mean that a client could keep an array or table of IDs rather than the actual object. Remember that since this is a one to many relation, the parent is, at most, going to have an object reference for one member of the collection.

One example of a possible use is to have a TT in the parent with a set of attributes and the object ID. One can then access the attributes in any desired order or combination and fetch the objects according to the ID.

Wed, 2010-06-30 20:01 — Guillaume Morin

In your example, why the TT

In your example, why the TT in the parent would hold an object ID, instead of holding directly the object reference ?

Wed, 2010-06-30 22:15 — tamhas

You could, of course, do

You could, of course, do that, but then you wouldn't be using the standard implementation of the collection class. :)

I.e., in going from model to code, I think there is a benefit of having the one to many relationship between two objects consistently turn into a collection class of a type appropriate to the order specified on the link. Adding an an attribute index table in addition to that is extra, so I would still keep the objects in the collection. E.g., the factory might well construct the parent, children, and the collection class, but the first task of the parent would be to build the table.

I'm not recommending that, mind you, just saying it is possible. I will be later providing SuperMap as a more generic solution.

Thu, 2010-07-01 12:20 — Guillaume Morin

Why use a TT anyway ?

And why would the parent use an TT to hold a set of attributes and object ID anyway ? Isn't it what your aAttrKeySet used for ? In that case, do you still see a difference in holding IDs versus the Objects into this aAttrKeySet ?

And what if instead of an int:ID you used an Indentity class, to encapsulate the specific details of int(obj) ? Is holding this Identify reference would be any different than holding the set Object reference ?

You do agree that for any consumer to get the object ID, it must first retrieve a reference to the object ? So if the the set has any "first access" logic (is that what you are talking about ?), then it would have been executed. So why throw away this reference and keep the ID, for later on retreive again the reference ?

I do not see any benefits for a set to be ordered by ID, other than having a faster execution for Contains() method (and Get() method if I can understand its use). Why would the model specify a relation link ordered by ID ? How can this be useful in a business use case ?

Thu, 2010-07-01 16:43 — tamhas

Remember that what we are

Remember that what we are implementing here is collections for one to many object relationships. So, given some thing like Order and OrderLine what structure are you going to use in Order to hold all the OrderLines except a collection. Yes, writing by hand you could just use a TT but then you are introducing custom code into a role that should be generic code. In the UML model, there is nothing in the diagram to show the collection, just an link with a multiplicity and sometimes an order. Doing model to code, a collection gets used to provide that link.

In many contexts, no particular order matters, one is just going to start at one end and go to the other when processing the elements in a collection. And, there is no need for random access either. If there is a context where there is a need for random access based on a single attribute, then the aAttrKeySet family will do the job just fine. If the context is random access on a single attribute, but for subsets, then aAttrSortSet fills the bill. If there are multiple such relationships at the same time, there is nothing wrong with having multiple collections containing the same element objects, but different attribute keys. Usually, in OOA/D, there is only one active relationship at a time, but it is certainly possible to have more than one. And, down the road, some variation of the SuperMap idea will provide multiple attributes in the same package.

But, in the meantime, I am open to the idea that the parent may have need for its own data structures. Maybe it is a list of IDs in a identifier subset. Maybe it is some special but temporary sort order. Some of these problems can be solved using other collections, but I'm not going to say that is the only or always the best way. So, it seems like an inexpensive option to provide.

I don't know that I can give you a separate use case for ID order off the top of my head, but it is one of the things suggested to me by a very old hand at OOP, so I included it. Certainly, if specific order doesn't matter, but there is the need for duplicate checking and/or random access, then ID order is justified for no other reason than performance. Contains is going to be inherently slow on an addition order set unless implemented with a TT and then one has the overhead of having a TT.

Wed, 2010-06-30 18:21 — ptfreed

Good point -- many

Good point -- many programmers won't use this, and many who do probably shouldn't. But this is an abstraction of an array, and while arrays are often misused they are sometimes very handy.

I suppose that I am assuming that the integers are to be 1..N (where N is the order of the set). Or perhaps the integers are assigned with the elements, just as values are assigned to slots in an array.

It's also possible that the integers will be assigned to the objects as they are added to the collection. This seems like a less useful model; one needs to be able to retrieve and store the assigned integers. I suppose there is a benefit if the numbers are guaranteed to be ascending in the order that the objects were added to the collection, but that seems like a small boon.

Other attributes integer keys are also useful -- they can easily be sorted, stored, passed through an AppServer, etc. Clever programmers will find a need for each of these, I'm sure.

Wed, 2010-06-30 22:11 — tamhas

Clever programmers

When we started this project, we expected to model our collection classes on the Java Collection and Map hierarchy, which is exactly what I did in the 2006 implementation. However, looking at the current hierarchies, we found that there were a lot of classes which had nothing to do with object to object relationships ... e.g., queues. Perfectly useful stuff, but certainly not nearly as fundamental as relationships, which are core technology which must perform well for the application to perform well.

Thus, for this initial release, we elected to focus exclusively on the functionality needed to support one to many object relationships. Other functionality we will look at later ... or, if we are lucky, maybe someone else will contribute that.

The aSet family is ordered by the sequence in which elements are added to the collection. This is the most common type of collection, by far, since most of the time one just grabs the elements and walks through them performing some operation(s) on each element. If that is all you need, that is all you should use. Anything else is just adding overhead for no benefit.

The primary implementation we are thinking of here is a set of 10 arrays, logically treated as a single array. These will be filled starting with 1 and go up to N, the maximum size of the set. If any are deleted, they will be left as null and simply skipped over. That could present a problem with GetNth since the Count of the set may be less than the position of the highest element and at some point the N will need to increment by more than 1 to skip over a null.

The aIDSet family is ordered by the ID of the object, i.e., int(ObjRef). The primary implementation we are thinking of there is a work-table pointing to an array. The array is filled in order of addition and the work-table is in ID order with the number of the element in the array. Deletions here just delete work-table records and records can be created in the middle of the table.

aAttrKeySet and aAttrSortSet are sorted by the key value supplied with the object. So, they can be built in any order. We will be supporting any key type.

Does that make it more clear?

Wed, 2010-06-30 03:43 — ptfreed

Possible additions

Unless you assume that the key can be uniquely determined from the object it represents, you may want to add the Contains operator to aAttrKeySet and aAttrSortSet. On the other hand, if the object does determine the key then you don't need both RemoveKey and RemoveValue.

Another useful set operator might be Count.

Should aIterator have an operator like EndOfData? It can be help to distinguish whether a null return represents an error or just the end of the data. Perhaps this isn't necessary because of your plans to implement error handling?

Wed, 2010-06-30 19:48 — tamhas

Contains is in iAttrSet

Contains *is* in iAttrSet and therefore in both aAttrKeySet and aAttrSortSet.

The key is normally an attribute of the object, but within the collection, the object is only a Progress.Lang.Object so those attributes are not visible. So, the key has to be supplied by the parent.

Count is redundant with the Size property.

End of Data is redundant with the HasNext or HasPrev property.

Note that it can be argued that we should not use HasNext and HasPrev, but rather should just return Null to indicate end of data since it means a test for every iteration of the loop, but currently we are electing to retain them.

Thu, 2010-07-01 19:03 — ptfreed

I guess I was just being

I guess I was just being picky.

"Contains *is* in iAttrSet"

iAttrSet is not mentioned above. It may be described in the UML, though. (I haven't had a chance to really look at those docs.)

"and therefore in both aAttrKeySet and aAttrSortSet"

That seems reasonable. But you explicitly listed the operators in aAttrKeySet, and Contains is not among them. I couldn't imagine it being left out, but I thought I'd ask.

I'll go back to my corner now. :-)

Thu, 2010-07-01 19:44 — ptfreed

I found Contains -- along

I found Contains in the UML docs.

I guess the moral is to read to the end before raising my hand, just like we learned in Kindergarten.

I'll look over the UML this evening.

Thu, 2010-07-01 19:53 — tamhas

The moral is also that I

The moral is also that I need good proofreaders like you!

Thu, 2010-07-01 19:08 — tamhas

The UML is more

The UML is more authoritative than the page ...

I have fixed the page.

The OpenEdge Hive

More Navigation

OERA Open Source Initiative