Enhance database tracking

Mon, 2007-07-09 15:33 — gwest

Project:	Prolint Issue Tracker
Component:	Outputhandlers
Category:	feature request
Priority:	normal
Assigned:	gwest
Status:	closed

PROFILER stores all runs with a 'profiler id' so that you can look at things over time, or just attach a profiler ID to a ticket or email for a reviewer to look at. ProLINT records only the last LINT run of a program in the database, and does not have a nice way of retrieving the information later by another user, or a way of identifying the lint results to be reviewed by the reviewer.

I'd like to see the equivalent of a profiler ID associated with each run, the ability to store results from multiple runs, the ability to pull the results of a specific run into the results viewer, and a statistical comparison by rule and program of the number of alerts in different runs.

To do this, I would propose adding a Lint_session table with a generated sequence number, session name, user ID, program name, date, etc. Then add the session_id to the lint_warning table as its own index.

Modify the lintdb output handler to generate the session ID and create the session record. This would be handled by dynamic buffers so that it would be backward compatible if the existing lint database does not have the new field/table.

Add a session name field to the lint UI, if the selected ruleset has the lintdb as an output handler.

Add an affordance (button) to retrieve results from the database by session and/or compunit. This runs a program which reads the db and publishes the results to the results viewer window.

Add an affordance button to connect the prolint database, or to add an alias to an existing database connection where the prolint data can be found.

Add an affordance button to the results viewer for statistical analysis by compunit between two different session IDs. This would summarize, for a compunit, the number of alerts for each rule so you can see if a new 'break' was added to the code in the most recent modification. When you enter the compunit, a browse would show the sessions containing that compunit in decending order. You select two and say compare. Second browse shows each rule and the number of breaks on that rule in each session.

Is this an acceptable design? Any suggestions for changes? Anyone willing to take it on, or do I need to? :D

Comments

Mon, 2007-07-09 17:12 — jurjen

enhance prolint results database

Yes, that is a totally acceptable design.

What would be the best moment to create a new lint-session id? The easiest way would be to create a new ID on init of the outputhandler, but that is on each run of prolint.p. I think I would prefer to create a new session manually and have all subsequent prolint-runs contribute to the most recent session. Because, the data in the lint_warning table gets more and more interesting when more sourcefiles are inspected by different runs of prolint.p (with different filespecs and different rulesets)

Or maybe even a combination to add more power:
run_id is automatically incremented on each run of prolint.p, but the data is gathered by session; sessions are created manually. Session_id is in the key of the lint_warning table but run_id is just an attribute field and replaces the current datetime field. The prolint_run table automatically records:
- run_id (unique)
- current session_id (is most recent session_id)
- date and time
- user
- perhaps machine name
- which rules are selected in the ruleset
- other startup params of prolint.p
- current roundtable task id, if avail??

Some more ideas:
- Add a button to delete old sessions (by session_id).

More?

Tue, 2007-07-10 12:51 — gwest

All interesting ideas. My

All interesting ideas. My thought was that the session ID was created only in the "lint this" dialog, and only if not already present (good old global shared trick), and only if one of the output handlers is the database. It is modifiable by the user to another pre-existing session ID or to generate a new one if they want to separate themselves from what they've done in the past (say working on multiple projects in Eclipse). We could give them a browse of all past sessions, with a userid filter set to their userid. But I could also see the value of a runid where you want to lint the same compunit multiple times and see the effect of all changes made in your session. It would complicate the statistical comparisons a little, since there are now 2 variables for chosing what to compare. The idea of different runs having different rulesets could potentially make comparisons useless (or in certain circumstances more valuable as you point out).

I might need some guidance on capturing the ruleset and selected rules. I can add the roundtable task ID to the schema, but since I don't have roundtable, I can't populate or test.

The delete should be an admin task, you don't want users deleting something that belongs to someone else, and it would be more for archival (to keep the database bloat down). I see that as a separate enhancement since you might want to batch it, and I can see lots more parameters to decide what to purge. Maybe a later phase or separate task.

So, who's going to do the work? If you want me to do it, this is my first opensource project so I have little knowledge of proper procedure.

Glen West
MIS Architect
EarthLink

Tue, 2007-07-10 20:57 — jurjen

Re: All interesting ideas

Sorry, what I meant was that run_id simply replaces the lint_warning.lintdate field, because it may be interesting to save much more about the environment than just the date. I did not mean that it should be possible to compare the warnings of two run_id's: warnings from the current run_id simply overwrite and extend warnings from the previous run_id (in the same session_id) just like they do today.

Of course I want you to do all the work :-)

The procedure is, well, whatever you want, because you are working voluntarily. You can simply email your code to me, if you think that's easiest.

But generally I think the best and most transparent procedure is to create an issue to explain what you like to change (and you've already done that), then assign that issue to yourself so "everyone" knows that you are working on it. (to assign an issue to yourself you have to create a follow-up on the issue). When you are done writing code, create a another new follow-up on the issue and attach your code changes to it. This has the effect that the modification is public and it puts pressure on the guy who has to submit the new code into the subversion repository (yes that's me).

I usually review and test a little before I commit it to the repository. If it is a new or changed feature then I usually don't commit until there is some kind of help or documentation available. That's why some contributions were delayed in the past: I don't like to commit things that I don't understand or cannot describe. I bet that's frustrating for the volunteers, sorry.

Ok, so once the new code is in the subversion respository there comes a time that I have to create a new Prolint release. I don't usually do that for a small code change, but what you are proposing is certainly large enough for a release.

Tue, 2007-07-10 21:19 — jurjen

Re: All interesting ideas

> I might need some guidance on capturing the ruleset and selected rules.

Prolint uses publish/subscribe to broadcast information to the outputhandlers. The published events are documented on page /node/231

From within the outputhandler just subscribe to "Prolint_list_rules" to receive a comma-separated list of the selected rules. In fact this already happens in outputhandlers/prolintdb.p

Or subscribe to "Prolint_status_profile" to receive the name of the ruleset. Actually it's the name of the profile, I wish I had been so smart to define rulesets in addition to profiles. But that's a whole different enhancement request, never mind :-)

> I can add the roundtable task ID to the schema, but since I don't have roundtable, I can't populate or test.

Don't bother then. It might not be a good idea after all.

Tue, 2007-07-10 21:38 — jurjen

session_id and dynamic buffers

> This would be handled by dynamic buffers so that it would be backward compatible if the existing lint database does not have the new field/table.

Please, I prefer static buffers. It's not just the outputhandler, but also the procedures that calculate statistics and view stuff.

Would it not be easier for everyone if we try to detect that the database schema is too old (and perhaps even try to import the delta.df automagically)?

If the new session_id field has an initial value of zero, and there are lint_warning where session_id=0 then we know that they are old. You might even create a record lint_session and assign lint_session.session_id=0 for integrity.

Wed, 2007-07-11 13:46 — gwest

Jurjen's replies

"session_id and dynamic buffers"

OK, I think the easiest/right answer is to mix it. The fastest and easiest way of knowing which schema you're on is a dynamic find on the new table, or an attempt to create a temp-table 'like' the new table and look for error-status. That's our 'version indicator'.

By the fact that the code is there at all looking, we know that the incremental .df will be there. We prompt them to load it or turn off the database output handler. The disadvantage of this approach is they cannot continue to update the database unless they upgrade it. On the other hand, the 'backward compatibility' is of dubious value without this mod anyway, so they should not be too concerned, because without the new features it's always 'this run only' anyway.

Any code that is database aware would check the schema version and abort if not updated, running the same .p to alert the user with instructions on how to resolve. I don't want to try 'automagic' because if they are using a database, it is possible/probable that someone else will be updating it with lint results and we don't want to worry about a schema lock or blocking someone else from doing their work storing the results into the DB.

What worries me about this approach is the early adopter. He gets the warning, but the other 15 users are still on the old version. We have to make sure the old version will continue to work once the schema is on the new version. It is this problem that prompted me to suggest the dynamic buffer idea.

Backward/forward compatibility is always a very interesting concern especially in an open-source project.

"All interesting ideas"
Should we look at implementing rulesets independently of profiles? I kind of equate the two and was meaning the 'profile' field when I said 'ruleset'.

Will drop the roundtable idea.

An import/export feature for profiles would be interesting as well, a way to distribute the new ruleset/profile to the development group within the organization.

Another concern I have is that a lot of the code is AppBuilder based, a tool we have but do not use. But I'll give it a shot. We may need to work together on this where I feed you code, you fit it into "proper" appbuilder structure.

Glen West
MIS Architect
EarthLink

Wed, 2007-07-11 22:42 — jurjen

profiles versus rulesets

It is a bit off-topic for the prolint.db discussion but since you brought it up: a "profile" currently defines which outputhandlers to use AND which rules to run.
So a profile defines two totally different sets. Prolint does not currently have the concept of "ruleset", but if it would then I think a ruleset would define which rules to run, regardless of outputhandlers or other settings.
With rulesets it would be easier to run some selection, and then run the same selection again but to different outputhandlers, without having to make a temporary profile change.

An import/export feature for profiles is not something you need to program: each profile is simply a subdirectory under "prolint/settings" so you can use the filesystem copy/move/delete commands to distribute or maintain them. Yeah I know, the basedir should have been "prolint/profiles" or "prolint/settings/profiles" but its too late to change that now.
Doesn't the development group use a shared drive where prolint is installed, so the settings are shared?

Thu, 2007-07-12 14:26 — gwest

date/time

Can we assume V10 database? I'd like to use datetime as opposed to 2 fields.

Glen West
MIS Architect
EarthLink

Thu, 2007-07-12 23:01 — jurjen

re: date/time

No, it must be possible to run Prolint in Progress 9.

The Exchange showed that not every VAR has adopted OpenEdge yet. We have, but not all of our customers have upgraded to our OpenEdge-based release of our ERP. So I am afraid Prolint has to support Progress 9 for a while.

Mon, 2007-07-16 01:58 — gwest

Need button images for: 1)

Need button images for:
1) connect to database
2) retrieve results from database
3) Statistical Summary

I don't have any graphic editing ability. Please email me suggested images for the buttons on the results viewer.

Glen West
MIS Architect
EarthLink

Tue, 2007-07-17 02:15 — gwest

Code complete except for images

Please see my earlier message. I need images for retrieving from the database to the results viewer, and for comparing statistics between two sessions. Everything else is complete. Please advise on button images (for the top row above the browse/treeview, same size as the others).

Glen West
MIS Architect
EarthLink

Thu, 2007-07-19 21:26 — jurjen

re: code complete except for images

Hi Glen,

I was away for a couple of days.
Just wondering: what if nobody responds to your request to advise on button images? Do you absolutely need images? I am very bad at drawing.

What does the window look like where those new buttons are to be placed? Maybe you can just add bad images, publish a screenshot, and then hope that someone has inspiration to replace those bad images.

Thu, 2007-07-26 02:21 — gwest

old statistic functions

>>A question: when statistics are calculated or browsed, which session ID should be used? Or should the stats be saved for each session so they can be compared as well?

>It was my intention that it be restricted to session 0 - mass compiles of whole segments of applications would get too confusing if multiple sessions with multiple profiles become involved.

... but this won't work either as a 'sessionaware' lint session will not allow you to lint on session 0. Hmmm. More thought needed.

Glen West
MIS Architect
EarthLink

Wed, 2007-07-11 13:25 — gwest

#1

Assigned to:

Anonymous

» gwest

Assigning to myself to do the work.

Mon, 2007-07-23 15:39 — gwest

#2

Status:

active

» patch (code needs review)

Code complete, zip file mailed to Jurjen.

Mon, 2007-07-23 19:12 — jurjen

#3

Assigned to:

gwest

» jurjen

Wow you have been working hard. I am reviewing

Mon, 2007-07-23 22:51 — jurjen

#4

The code is committed to subversion, very slightly modified, but I think it is not ready to be released yet.

I have seen some things in the code that I don't really understand yet, like extending the lifetime of propsuper.p, or the purpose of new message Prolint_context and its timing. I am not sure what the expected behaviour is of the new buttons in the results window, or how the old statistic functions behave with the new database (I am pretty sure they fail, because the code is not changed). I also have not tested with dlc 9.1 yet.

You have discussed the issues with versioning and backward compatibility and early adopters. Well, if the early adopter upgrades the schema and starts using the new outputhandler, while his team members continue to use Prolint 69 with the old outputhandler, they will trash the data(!) unless all queries specifically work on session zero, which they don't. I am afraid the new database schema cannot be used with both Prolint 69 and Prolint 70 so it would be best if Prolint 69 simply fails with the new db while Prolint 70 simply fails with the old db. This also means that Prolint 70 should not have two different outputhandlers "prolintdb.p" and "prolintdbsess.p" but only one to avoid confusion.

A question: when statistics are calculated or browsed, which session ID should be used? Or should the stats be saved for each session so they can be compared as well?

Thu, 2007-07-26 01:55 — gwest

#5

Assigned to:	jurjen	» gwest
Status:	patch (code needs review)	» patch (code needs work)

Responses to Jurjen's questions:

>I have seen some things in the code that I don't really understand yet, like extending the lifetime of propsuper.p, or the purpose of new message Prolint_context and its timing. I am not sure what the expected behaviour is of the new buttons in the results window, or how the old statistic functions behave with the new database (I am pretty sure they fail, because the code is not changed). I also have not tested with dlc 9.1 yet.

Propsuper becomes the repository of the session ID. It is essentially launched by desktop or by logwin. This allows me to avoid profanities like the global shared trick. all it does is raise the scope of the persistent procedure, and by my unit testing it appears to cleanly remove itself when the last subscriber dies, though I cannot say I've done exhaustive tests. The prolint_context event allows me to know if findsessions is being called in order to return a session id (via propsuper) or to load the events from the selected session (if invoked via logwin). Allowed me to reuse the code with one small tweak of what happens once a session is selected, and controls the name of the "ok" button to indicate to the user what will happen. As I mentioned in the doc file, the new buttons stay visible in logwin, to tell the user to connect the database, or create it, just as the old statistics buttons do, actually uses the same message. As for whether the old statistic buttons will work, I have to admit I didn't even think to look. I have assigned this back to myself to do that cleanup work.

>You have discussed the issues with versioning and backward compatibility and early adopters. Well, if the early adopter upgrades the schema and starts using the new outputhandler, while his team members continue to use Prolint 69 with the old outputhandler, they will trash the data(!) unless all queries specifically work on session zero, which they don't. I am afraid the new database schema cannot be used with both Prolint 69 and Prolint 70 so it would be best if Prolint 69 simply fails with the new db while Prolint 70 simply fails with the old db. This also means that Prolint 70 should not have two different outputhandlers "prolintdb.p" and "prolintdbsess.p" but only one to avoid confusion.

The intent is that all the old functions would work with session 0. But I have to admit, I never tested it, or even reviewed the code in that context. Again, I will take a look at that, but if your thought is to simply remove the prolintdb handler, I will not argue with that. I just wanted to give some shot at backward compatibility. Maybe we need to have a check at the top that says if the new database is connected, this cannot be run. But as long as you stay on the old schema, things are fine.

>A question: when statistics are calculated or browsed, which session ID should be used? Or should the stats be saved for each session so they can be compared as well?

It was my intention that it be restricted to session 0 - mass compiles of whole segments of applications would get too confusing if multiple sessions with multiple profiles become involved. Where the code does not line up between sessions, things can get really messy. This led me to realize that I need a filter on the stats screen by compunit and rule. I will add that as long as I am in 'rework' mode.

Thanks for the feedback. I'll get the next revisions to you as soon as I can.

Sun, 2007-07-29 03:35 — gwest

#6

Assigned to:	gwest	» Anonymous
Status:	patch (code needs work)	» patch (code needs review)

Resubmitting for review. Major changes:
a) existing statistical programs will still work if not a session-aware database.
b) if a session-aware database, they will use only session 0
c) can now lint on session 0 to build statistical base, get warning instead of insensitive.
d) new filters on compunit and rule in the comparison of two sessions.

Tue, 2007-09-04 17:26 — gwest

#7

Assigned to:	Anonymous	» gwest
Status:	patch (code needs review)	» closed

Released in prolint 70

The OpenEdge Hive

More Navigation