Harvesting Editor - 4

Initial Rule Set and Operations
In the initial implementation, the goal will be to classify any one block of code as wheat, chaff, or unknown, where “wheat” is material considered a good candidate for harvesting, “chaff” is material considered to be unlikely to be worth harvesting, and unknown is anything that doesn’t fall into either category. Chaff sections will be indicated by a 15% grey background; wheat sections by blue text; and unknown sections by black text. Sections will be boxed or delimited in some way so that one can easily select the whole section. Ctrl-Plus will “promote” a section from chaff to unknown and unknown to wheat; ctrl-Minus will “demote” a section from wheat to unknown and unknown to chaff. Ctrl-[ will collapse a marked section to a single indicator line; ctrl-] will undo the collapse to visible text. Ctrl-C can be used to copy an entire section to the clipboard for use in pasting into the desired harvesting repository.

Initial rules to identify “chaff” will be:

  1. All DEFINE statements. While the variables defined may be needed for harvested wheat code, that code will typically be packaged differently than it appears in the source program and is likely to be refactored. Thus, one generally won’t want to capture the variable definitions with the code because the definitions in the harvested code are likely to change in form. Also variable definitions are often widely separated from their use, making harvesting them both as a unit difficult.
  2. All lines consisting only of whitespace. Trailing whitespace will be included in a preceding chaff section; preceding whitespace not already included in a chaff section below will be added to a chaff section which trails it. Alternatively, an option might be provided to simply eliminate any whitespace.
  3. All include references, although one can drill down into the include and harvest from it as well. Include references themselves are marked as chaff because it is extremely unlikely that they will be harvested as such.
  4. Simple assignments including:
    1. Simple assignment of a value from a database table to a local or shared variable.
    2. Assignment of literals to a local or shared variable.
  5. All UI updates and displays.
  6. Access to “system” tables, a user provided list.
  7. Comments (see below).

Initial rules to identify “wheat” will be:

  1. FORM, DEFINE FRAME, and implicit FORM statements that define UI layout. These are included based on the assumption that one will be trying to capture the general screen layout, e.g., in a fashion similar to Pro/Dox, even though the details of the UI and the technology of its display will be significantly different in the rearchitected code.
  2. VALIDATE statements for database fields (might check dictionary and/or a good early candidate for checking previously harvested code per some convention).
  3. Flow of control logic. A flow of control block whose contents are entirely UI statements or other chaff will be considered chaff as well.
  4. Database access other than to “system” tables.

Note that the simple assignments rule does not include any assignment in which there is computation, since that might be an indicator of a business rule. Some form of the simple assignment might be required to supplement a harvested piece of logic in order to provide appropriate initial values, but these assignments are moderately likely to be of a different form than in the source code. It is also common for them to be physically removed from the place where the value is used. Note also that while the values associated with such variables may well be important in determining control flow, most such control flow will not be harvested in the form it is in. E.g., a file maintenance program might have sections for creating, modifying, or deleting a record depending on whether a record already exists with the specified key and/or some user input. While the code in each section related to what one does to create a record, modify a record, or delete a record may be captured in three code fragments, the control flow leading to those blocks is a part of the local architecture of the old program and will be implemented differently in a new architecture.

While possibly not in the initial implementation, it would be desirable to be able to identify blocks of code such as the following as chaff:

do for uom:
find uom of item no-lock.
display uom.description[1].
uom--code = item.uom.
end.

Here there is a strongly scoped block, i.e., we know that there are no references to the UoM buffer outside this block that are not inside their own strongly scoped blocks, and within that block there is a no-lock find, a display, and an assignment to a local variable, i.e., it should be a block of chaff.

Also, specific to the code in the samples attached, all blocks referencing init-val and condition should be chaff, but I’m not immediately sure how to make that into a rule. This highlights the need for base rules that are likely to be used with any code and site-specific rules that are added based on the particular body of code currently being harvested.

It could be desirable to eliminate all blank lines, but this would limit any tools for linking back to the original. Instead of attaching them to adjacent chaff blocks as is suggested above, an alternative would be marking them as chaff in their own right and default to displaying these as compressed.

It would probably be useful to “pretty print” to standard indentation prior to marking up the text so that the indentation was an accurate rendering of block structure.

Wheat and chaff rules should have a “weight” and a run-time option should be provided to only mark up wheat or chaff that exceeded a certain weight value. Separate values should be provided for wheat and chaff. E.g., one might assign the comments rule for chaff to -1 and then a cutoff value of 0 would mark comments as chaff and a cutoff of -1 would not. It should also be easy to simply turn a particular rule off when desired. Each chaff rule might also be associated with a flag as to whether initial display should be compressed.

A button or keystroke should be provided to bring in any individual include in the fashion of the COMPILE LIST option. A run-time preference could be provided to default to this behavior or to not bringing in the include.

An alternate treatment of comments would be to mark them as wheat or chaff according to the nature of the node to which they are associated, typically the line of code following for comments that occupy one or more lines.

Next Page