[Vm-dev] [Pharo-dev] Image crashing on startup, apparently during GC

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sat Mar 31 21:31:37 UTC 2018


2018-03-31 20:36 GMT+02:00 Esteban Lorenzano <estebanlm at gmail.com>:

>
> hi,
>
> On 31 Mar 2018, at 17:34, Nicolas Cellier <nicolas.cellier.aka.nice@
> gmail.com> wrote:
>
>
>
> 2018-03-31 15:03 GMT+02:00 Esteban Lorenzano <estebanlm at gmail.com>:
>
>>
>>
>>
>> On 30 Mar 2018, at 23:56, Nicolas Cellier <nicolas.cellier.aka.nice at gmai
>> l.com> wrote:
>>
>>
>>
>> 2018-03-24 11:11 GMT+01:00 Esteban Lorenzano <estebanlm at gmail.com>:
>>
>>>
>>> hi,
>>>
>>> > On 24 Mar 2018, at 09:50, Cyril Ferlicot D. <cyril.ferlicot at gmail.com>
>>> wrote:
>>> >
>>> > Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
>>> >> Hi Damien,
>>> >>
>>> >> Indeed the image is corrupt at start-up.  See below.
>>> >>
>>> >>
>>> >> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>>> >>
>>> >> Spur:
>>> >> Fix a bad bug in SpurPlnningCompactor.
>>> >>  unmarkObjectsFromFirstFreeObject, used when the compactor requires
>>> more
>>> >> than one pass due to insufficient savedFirstFieldsSpace, expects the
>>> >> corpse of a moved object to be unmarked, but
>>> >> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>>> >> Unmarking the corpse before the copy unmarks both.  This fixes a crash
>>> >> with ReleaseBuilder class>>saveAsNewRelease when non-use of
>>> cacheDuring:
>>> >> creates lots of files, enough to push the system into the multi-pass
>>> regime.
>>> >>
>>> >>
>>> >> Pharo urgently needs to upgrade the VM to one more up to date than
>>> 2017
>>> >> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
>>> >> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
>>> >> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
>>> >> corruption in large images, and can occur (as it has here) at
>>> start-up,
>>> >> causing one's work to be irretrievably lost.
>>> >>
>>> >
>>> > Hi Eliot,
>>> >
>>> > I think that there is a lot of people who would like to get a newer
>>> > stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
>>> > which VM are stable enough to be promoted as stable.
>>> >
>>> > Some weeks ago Esteban tried to promote a VM as stable and he had to
>>> > revert it the same day because a regression occurred in the VM.
>>> >
>>> > If you're able to tell us which vms are stable in those present at
>>> > http://files.pharo.org/vm/pharo-spur32/ and
>>> > http://files.pharo.org/vm/pharo-spur64/ it would be a great help.
>>> >
>>> > Even better would be for the pharo community to have a way to know
>>> which
>>> > vms are stable or not without having to ask you.
>>>
>>> there is no “stable” branch in Cog, and that’s a problem.
>>> “released” versions (the version you can find as stable) are not working
>>> for Pharo :(
>>>
>>> I tried to promote versions from end feb and that crashed.
>>>
>>> next week I will try again, maybe now they are stable enough… one thing
>>> is true: the versions that we consider stable (from oct/17) present
>>> problems that are already solved on latest.
>>>
>>> Esteban
>>>
>>>
>>> >
>>> > Have a nice day.
>>> >
>>> >>
>>> >> --
>>> >> _,,,^..^,,,_
>>> >> best, Eliot
>>> > --
>>> > Cyril Ferlicot
>>> > https://ferlicot.fr
>>> >
>>>
>>>
>> Hi,
>> Several problems are mixed here, let's try and decouple:
>> - 1) there are ongoing development in the core of VM that may introduce
>> some instability
>> - 2) there are ongoing development in some plugins also
>> - 3) there are infrastructure problems preventing to produce artifacts
>> whatever the intrinsic stability of the VM
>>
>>
>> you are right in all points, but for me this is a problem of process.
>>
>> - we have no defined milestones so nobody knows if they can jump to help.
>> - plugin development happens “by his own” and nobody knows what happens,
>> why happens and how it happens.
>> - infrastructure is not bad and a lot of efforts has been made to make it
>> work. But code sources are scattered around the world and the only thing
>> that reunites them is the hand of the one who generates the C sources.
>>
>> IMHO, is this “disconnection” what causes most of the problems.
>>
>> cheers,
>> Esteban
>>
>>
> Hi Esteban,
> I see no fatality, and github also provides tools for that.
> Look, there is the project page on github
> https://github.com/OpenSmalltalk/opensmalltalk-vm/projects/1
>
> Maybe the Pharo team is willing to collaborate and take active parts in
> definition of milestones?
>
>
>>
>> For 1) development happens in VMMaker, and we have to be relying on
>> experts. Today that is Eliot and Clement.
>> We all want 64bits VM, improved GC, improved become:, write barrier,
>> ephemerons, threaded FFI calls and adaptive optimization.
>> Pharo is relying on these progress, they are vital.
>> IMO, we are reaching a good level of confidence, and I hope to see some
>> VMMaker version blessed as stable pretty soon.
>>
>> Instead of whining, the best we can do for reaching this state is help
>> them by providing accurate bug reports and even better reproducible cases.
>> Thanks to all who are working in this direction.
>>
>> For 2) we had a few problems, but again this is for improving important
>> features (SSL...)
>> Much of the development happens in feature branches already.
>> But since we are targetting so many platforms, and don't have automated
>> tests that scale yet, we still need beta testers.
>> We can discuss about the introduction of such beta features wrt release
>> cycles, that will be a good thing.
>> Ideally we should tend toward continuous integration and have very short
>> cycles, but we're not yet there.
>>
>> For 3)  we had a lot of problems, like staled links, invalid credentials,
>> evolution of the version of tools at automated build site, etc...
>>
>> If we don't build the artifacts, then we can't even have a chance to test
>> the stability of 1) and 2)
>> We have to understand that 3) is absolutely vital.
>>
>> May I remind that for a very long period last year, the build were broken
>> due to lack of work at Pharo side.
>> Fortunately, this has changed in 2018.
>> Fabio has been working REALLY hard to improve 3), and without the help of
>> Esteban,I don't think he could have reached the holy green build status.
>> We will never thank them enough for that. This also shows that
>> cooperation may pay.
>>
>> But this is still very fragile.
>> If we want to make progress, we should ask why it is so.
>> We could analyze the regressions, and decide if the complexity is
>> sustainable, or eventually drop some drag.
>> We are chasing many hares by building the VM for Newspeak/Pharo/Squeak
>> i386/x86_64/ARM Spur/Stack/V3 Sista/lowcode linux/Macosx/Windows ...
>> If it happens that a fix vital for Pharo/Squeak does break Newspeak
>> tests, then it slows down the progress...
>> Maybe we would want to decouple a bit more the problems there too (they
>> may come from some image side weakness).
>>
>> Last two years I've also observed some work exclusively done in the Pharo
>> fork of the opensmalltalk VM.
>> This was counter productive. Work must be produced upstream, or it's
>> wasted.
>>
>>
>> This happened just once or twice. And it was because people were ignorant
>> of “joint" so they continued contributing as before (and people were
>> pointed to right place when we had the opportunity).
>> And I disagree this was counterproductive because I took the effort to
>> merge the changes into osvm. This worked fine until I stopped to do that
>> job, but well… just one PR got stalled there for months and Alistair
>> integrated it recently.
>>
>> What *did happen* and I’m still not ready to let it go is a lot of the
>> small changes that we presented to be rejected (or ignored) without further
>> consideration. But well, let’s keep it positive and not enter to sterile
>> discussions, I just think you are wrong with this argument.
>>
>> No, it's important, we (opensmalltalk-vm team) can't let such bad feeling
> and frustration creep in.
> Every contribution counts, that does not mean that every PR will be
> accepted, but we owe an explanation if not.
> Some are accepted instantly, some are accepted after modification
> requests, some are rejected (I hope with some rationalization).
> What is problematic is that some were ignored for too long time, I regret
> the situation, but there is no deliberate intention to ignore them, just
> lack of manpower IMO.
> For example, the recent work of Alistair shows that there is no fatality
> here, it's just that someone has to do the hard work (kudos!).
>
>
> I’m sorry for have a bitter feeling, but I’m going to give you an example
> so you understand why I’m saying this: I proposed the refactor of Alien
> package (which is obviously right and simple) at least three times in last
> three years (by different means). First time I’ve been told “no, we prefer
> it like that”. Second time I’ve been told “we need to think about that”
> (and no answer later). Last time I even didn’t receive a response. So well,
> when Torsten proposed it he first came to me. I told him “talk in vm-dev
> and good luck”. His proposal was accepted (thanks god).
> Several situations like this one made me think that I’m a second class
> citizen in this community. And you know what? I’m 46 years and I do not
> want to be treated as is I’m a child that can play with the toys someone
> let me or not. So yes, I’m sad and not very “into” this this days. I made a
> lot of efforts to came back for the de facto fork we had because I always
> pushed to work together. But I do not see the spirit of collaboration I
> hoped.
>
> Torsten has brought the technical merits of doing so, and the cons of
status quo, it was nothing personnal.
I don't know how the topic was brought on vm-dev. at that time. Probably
the technical merit was not perceived then.
For me, you are in the opensmalltalk team, this does not mean 2nd zone (or
the whole Smalltalk community is second zone then).

Also, the question is coupled with stability: having a red status does not
> help, for almost every PR that I accepted, I had to dig into travis console
> reports and compare to status of previous build in order to know if it was
> a regression, or just a long time failing case... This does not scale!
>
>
> is worst that "does not scale".
> again, let me put you an example: Imagine I want to work on FFI and I need
> to touch both an external file and VMMaker: I cannot do a PR because
> VMMaker is not there and VM building will be broken and I cannot push
> VMMaker because platform sources are not there and VM building will be
> broken too.
> So, the only solution is what happens today: I need to push VMMaker and
> changes at the same time. So I’m forced to work on the development branch
> (we are all forced to work there), instead how I think it should be: each
> one should be able to work on their branch and contribute changes through
> PRs (PRs that can be validated with a good CI process).
>
> As a consequence, since all contributions go to the development branch… we
> have no stability and we need to wait of the blessing of a VM. That may or
> may not happen.
>
> Meh, whatever… is obvious that my way of work is different to most of the
> people of this community so I will go back to what I do now: just the
> absolutely necessary.
>
> It's not at all about my way or your way, you don't have to make it
something personal.
It' about the feasibility and sustainability of the different solutions.
You are right, branch development is not compatible with versionning of
generated code because it leads to unsolvable merge conflicts.
So you legitimately raise the question
https://stackoverflow.com/questions/893913/should-i-store-generated-code-in-source-control
Whatever diverging opinions, the very first condition for automating code
generation is to have reproducible artifacts (generated code).
You know that it's not the case today, even if generating twice from the
very same image.
We can't shove this problem under the carpet.
Otherwise, two different build of the same source could lead to different
VM behavior, which would be worse than what we have today.

Beside, except Eliot, Clement and Ronaldo, most developers work on plugins.
Developping plugins in branches is still possible, unlike core VM.
There is a reduced probability of conflict of gnerated code given the
number of concurrent developpers today.
So we commit in trunk more often than strictly necessary IMO.

Finally, the case you are describing is rare and mostly concern
modification of core VM.
As said in other thread, dev cycle of core VM feature is long anyway (6+
month).
In this context, feature branch are not sustainable.
A good branch is a short branch, everything else is illusory and leads to
merge nightmares.
In such conditions, one have to accept temporary unstability and organize
release cycles with stabilisation phase.
How differently does the Pharo release cycle works?
You have raised good points concerning the organization of those cycles.
My answer does not change: take your place, throw inferiority complex away
and participate.
You are very capable, and it will be beneficial to the community at large.

And don't take this answer as advocacy for the status quo.
I'm trying to analyze and identify locks.
It does not mean that we can't unlock :)

cheers
Nicolas

cheers,
> Esteban
>
> ps: I will not continue discussing this… I know how things are and I’m
> sorry to put such a negative perspective in this list, but I needed to say
> it.
>
>
> I'm all for more distributed power, and that should come with
> responsibilities, first a cooperative "you break it, you fix it" attitude.
>
> Or maybe do you want clarified decision process?
> For now, people that feel interested by a PR raise their voice.
> I don't know if we need something more formal
> For important design decisions there is vm-dev mailing list to discuss
> about that.
>
> cheers
> Nicolas
>
>
>> cheers,
>> Esteban
>>
>> I once thought that the Pharo fork could be the place for the pharo team
>> to manage official stable versions.
>> But I agree that this is too much duplicated work and would be very happy
>> to see the work happen upstream too.
>>
>> If you have constructive ideas that will help decoupling all these
>> problems, we are all ear.
>>
>> PS: i did not post this answer for avoiding sterile discussion, but since
>> Phil asked...
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180331/e0a17710/attachment-0001.html>


More information about the Vm-dev mailing list