Initial Rule Set and Operations
In the initial implementation, the goal will be to classify any one block of code as wheat, chaff, or unknown, where “wheat” is material considered a good candidate for harvesting, “chaff” is material considered to be unlikely to be worth harvesting, and unknown is anything that doesn’t fall into either category. Chaff sections will be indicated by a 15% grey background; wheat sections by blue text; and unknown sections by black text. Sections will be boxed or delimited in some way so that one can easily select the whole section. Ctrl-Plus will “promote” a section from chaff to unknown and unknown to wheat; ctrl-Minus will “demote” a section from wheat to unknown and unknown to chaff. Ctrl-[ will collapse a marked section to a single indicator line; ctrl-] will undo the collapse to visible text. Ctrl-C can be used to copy an entire section to the clipboard for use in pasting into the desired harvesting repository.
Initial rules to identify “chaff” will be:
Comments (see below).
Initial rules to identify “wheat” will be:
Note that the simple assignments rule does not include any assignment in which there is computation, since that might be an indicator of a business rule. Some form of the simple assignment might be required to supplement a harvested piece of logic in order to provide appropriate initial values, but these assignments are moderately likely to be of a different form than in the source code. It is also common for them to be physically removed from the place where the value is used. Note also that while the values associated with such variables may well be important in determining control flow, most such control flow will not be harvested in the form it is in. E.g., a file maintenance program might have sections for creating, modifying, or deleting a record depending on whether a record already exists with the specified key and/or some user input. While the code in each section related to what one does to create a record, modify a record, or delete a record may be captured in three code fragments, the control flow leading to those blocks is a part of the local architecture of the old program and will be implemented differently in a new architecture.
While possibly not in the initial implementation, it would be desirable to be able to identify blocks of code such as the following as chaff:
do for uom:
find uom of item no-lock.
display uom.description[1].
uom--code = item.uom.
end.
Here there is a strongly scoped block, i.e., we know that there are no references to the UoM buffer outside this block that are not inside their own strongly scoped blocks, and within that block there is a no-lock find, a display, and an assignment to a local variable, i.e., it should be a block of chaff.
Also, specific to the code in the samples attached, all blocks referencing init-val and condition should be chaff, but I’m not immediately sure how to make that into a rule. This highlights the need for base rules that are likely to be used with any code and site-specific rules that are added based on the particular body of code currently being harvested.
It could be desirable to eliminate all blank lines, but this would limit any tools for linking back to the original. Instead of attaching them to adjacent chaff blocks as is suggested above, an alternative would be marking them as chaff in their own right and default to displaying these as compressed.
It would probably be useful to “pretty print” to standard indentation prior to marking up the text so that the indentation was an accurate rendering of block structure.
Wheat and chaff rules should have a “weight” and a run-time option should be provided to only mark up wheat or chaff that exceeded a certain weight value. Separate values should be provided for wheat and chaff. E.g., one might assign the comments rule for chaff to -1 and then a cutoff value of 0 would mark comments as chaff and a cutoff of -1 would not. It should also be easy to simply turn a particular rule off when desired. Each chaff rule might also be associated with a flag as to whether initial display should be compressed.
A button or keystroke should be provided to bring in any individual include in the fashion of the COMPILE LIST option. A run-time preference could be provided to default to this behavior or to not bringing in the include.
An alternate treatment of comments would be to mark them as wheat or chaff according to the nature of the node to which they are associated, typically the line of code following for comments that occupy one or more lines.