Core Concepts – Duplicates | The OpenEdge Hive

Core Concepts – Duplicates

Sat, 2010-06-26 20:59 — tamhas

In any ordinary object relationships, one would never have duplicates because it simply makes no sense to have the same object twice on one end of a relationship. Some Java Collection objects enforce this rule and others don’t.

In designing these collection classes, we decided to omit checking for duplicates for two reasons. First, in any implementation which does not provide direct indexing by object identity, searching every element already in a collection to see if it matches a proposed new element is expensive overhead. Second, in most cases, the very nature of adding elements to a collection will not lead to duplicates, so incurring the overhead of checking for duplicates is wasted effort. Compare, for example, to populating a temp-table with the lines for a particular order based on the persisted data – would you feel the need to include logic to check to see whether an order line was already in the table or would you assume, rightly, that sequential reading of the persisted data will result in no duplicates.

We are electing to provide a Contains() operator on our collection classes which the parent can use to test for duplicates if a situation arises in which it may not be clear if a particular object is already in the collection.

With key-value sets there is some question about what one means by “no duplicates”. In the Java Map classes, no duplicate keys are allowed based on the idea that each key should uniquely identify an object, but there is no restriction on duplicate objects, i.e., the same object can be pointed to by different keys. This makes little sense in a relationship collection. Consequently, we have provided two types of key/value pair objects, one of which limits to unique keys and the other of which doesn’t. The former is like the Java Map class and the later provides for collections ordered by an attribute, but not uniquely identified by them. Depending on implementation, these would or would not support duplicate objects, but there seems little reason to allow duplicates if it is easy to prevent.

Printer-friendly version
Login to post comments

Comments

Wed, 2010-06-30 01:59 — ptfreed

Multiple keys

You said:

"In the Java Map classes [...] there is no restriction on duplicate objects, i.e., the same object can be pointed to by different keys. This makes little sense in a relationship collection."

I haven't thought this out nearly as well as you have -- but what if you want to know how many red objects you have in your bag, and then how many round ones? Keying by multiple attributes can be a handy thing.

Perhaps the answer lies in the term "relationship collection." I'm not sure what this means, as compared to a regular collection.

Wed, 2010-06-30 02:17 — ptfreed

I re-read the docs from the

I re-read the docs from the beginning, and now I'm sorry that I posted. I apparently forgot the fact that these collections are specifically designed to hold object joins. I now see your point about multiple keys.

Still, it might be handy to know how many of my patients are diabetic, or which among them have left me money in their wills. There are a variety of ways of addressing this question -- including simple iteration -- but the most obvious seems to be the ability to key objects by multiple attributes.

Does this question make sense?

Wed, 2010-06-30 17:54 — tamhas

SuperMap

Some of that sort of question could be addressed with the the version of AttrSet that allows duplicate keys, but, one would have to build the collection for the purpose, i.e., it would be one collection for diabetic and one for wills. There is a plan for a future collection type that will allow multiple keys, once called SuperMap, but that name now might change.

Which said, recognize that you are thinking in relational, not OO terms. I.e., you are thinking about having this table and running through it on some attribute and doing a count. Not surprising, since one does that sort of thing often in non-OO ABL. In OO, though, you are going to have something more like a relationship between a Doctor and Patient called something like IsDiabetic. The selection of what patients go in that collection is where the test for being diabetic will be made. Once the collection is built, the Size of the collection gives you the answer. That collection doesn't need the diabetic attribute in it at all since, by definition, the collection for IsDiabetic will contain only those who are diabetic.

Wed, 2010-06-30 19:31 — ptfreed

Not really relational

I suppose that I expressed it in relational terms, but the underlying requirements are the same regardless of the programming model.

Associated with a Doctor is a relational collection called Patients. Now I want to select a subset of these Patients for whom "Patient::IsDiabetic = true"

The ability to use a key would make the collection process faster, and might even avoid need to create a second collection.

Another way to look at it is a "hint." "Here are typical ways I am going to be examining this data. Perhaps you can make them more efficient for me. Or perhaps you will ignore me completely." The results of such a hint are, of course, implementation dependent.

But as I think this through, I see that this concept severely stretches the bounds of what you're trying to accomplish. I guess that these capabilities will have to wait for SuperMap. (Each time I type *SuperMap* I hear a fanfare in my head - dih dih-dih DAAH! If it ends up being called that, I may have to write it a theme song.)

Thu, 2010-07-01 17:14 — tamhas

A theme song ... now,

A theme song ... now, there's an idea!

I have been talking a lot in recent months to a very old hand in OOP, but not ABL. By old hand I mean someone who has been in computing even longer than I have and has spent just about as much of that in OO as it was possible to spend. He has been very helpful in getting me to think in real OO terms after a lot of years thinking in relational terms. So, I'm going to pass that along ... :)

The whole idea of thinking about a collection of patients with a variety of attributes and extracting subsets is thinking of the patients like a table, i.e., in RDB terms. In OO, one is thinking of a relationship. So, it is more likely that one will instantiate one doctor class and a number of patients in an IsDiabetic relationship than that one would get a larger group of patients and then subset the Diabetic ones. Or, if the relationship at issue was LivesIn divided by City, one might use an aAttrSortSet family and then extract subsets, but again it is one relation for one purpose at one time. Yes, I do know use cases where there is more than one relation between the same objects or a subset of objects at the same time, but again it is something at one creates and the uses and the usage is all about having a direct link, not an index, to the objects of interest.

The OpenEdge Hive

More Navigation

OERA Open Source Initiative