[Vm-dev] VM stability / unit tests
eliot.miranda at gmail.com
Fri Mar 30 21:49:38 UTC 2018
> On Mar 30, 2018, at 1:35 PM, Phil B <pbpublist at gmail.com> wrote:
> While I've been enjoying the fantastic performance improvements we've seen from Cog onward, one thing I've been less excited about are some of the stability/functionality issues I've been running into. They are not numerous (maybe 1/2 dozen or so major ones in the last 5 years) but they are getting quite tedious to isolate and replicate. Recent examples that come to mind include the 64-bit primHighResClock truncation and 'could not grow remembered set' issues (My current joy is a case where I have an #ifTrue: block that doesn't get executed unless I convert it to an #ifTrue:ifFalse: with a noop for the ifFalse:.. I'll provide a reproducible test case as soon as I'm able. The specific issue isn't the issue, but rather that I keep hitting things like this that seem rather fundamental yet edge-casey at the same time)
> I don't expect perfection as a phenomenal amount of progress is being made by a small group of people but I am beginning to wonder if the existing unit tests are sufficient to adequately exercise the VM? I.e. so that the VM developers are aware that a recent change may have broken something or are the existing tests mainly oriented towards image and bytecode VM development? Just some food for thought and wanted to see if it's just me having these sorts of issues...
Part of the problem is in creating test frameworks that are stable enough and complex enough. It's a lot of work. Consider the most unstable part of Spur for the past year, the new compactor, which took a year to become fully reliable (touch wood). The last case that showed the last bug I fixed required a really large image, a snapshot and a load of that snapshot followed by a GC to show the bug.
In fact what this shows is that writing regression tests is easy but writing adequate stress tests is hard. In my experience it's more effective to let the community provide the stress tests and try and be as responsive as possible in fixing the bugs as soon as they appear. So having knowledge of how to create reproducible cases, knowing the right channel to report a bug, etc, are important.
And if I'm right here then this points to the need for a workflow where VMs are built and tested automatically from tip. I don't properly understand the issue, but I'm frustrated that the current Pharo vm is way behind that compactor bug fix. I think the issue is that the Pharo vm has more than one tip; it has the execution engine/GC/FFI tip that Clément, Nicolas and I take responsibility for, and then there's the various library extensions (for git, fonts, imaging) that is a significant weight on Esteban's shoulders, and then there's SSL support from Tobias, etc.
So perhaps we need a two tier VM code base, so we can decouple these various tips and advance each tip to "the stable branch" when appropriate. That in turn requires a CI infrastructure which allows developers of each tip to test their changes in the context of an otherwise stable code base.
More information about the Vm-dev