[Vm-dev] [Pharo-dev] Image crashing on startup, apparently during GC

Esteban Lorenzano estebanlm at gmail.com
Sat Mar 31 18:36:22 UTC 2018


hi,

> On 31 Mar 2018, at 17:34, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:
> 
> 
> 
> 2018-03-31 15:03 GMT+02:00 Esteban Lorenzano <estebanlm at gmail.com <mailto:estebanlm at gmail.com>>:
>  
> 
> 
>> On 30 Mar 2018, at 23:56, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com <mailto:nicolas.cellier.aka.nice at gmail.com>> wrote:
>> 
>> 
>> 
>> 2018-03-24 11:11 GMT+01:00 Esteban Lorenzano <estebanlm at gmail.com <mailto:estebanlm at gmail.com>>:
>> 
>> hi,
>> 
>> > On 24 Mar 2018, at 09:50, Cyril Ferlicot D. <cyril.ferlicot at gmail.com <mailto:cyril.ferlicot at gmail.com>> wrote:
>> >
>> > Le 23/03/2018 à 21:52, Eliot Miranda a écrit :
>> >> Hi Damien,
>> >>
>> >> Indeed the image is corrupt at start-up.  See below.
>> >>
>> >>
>> >> Right.  This VM is prior to the bug fixes in VMMaker.oscog-eem.2320:
>> >>
>> >> Spur:
>> >> Fix a bad bug in SpurPlnningCompactor.
>> >>  unmarkObjectsFromFirstFreeObject, used when the compactor requires more
>> >> than one pass due to insufficient savedFirstFieldsSpace, expects the
>> >> corpse of a moved object to be unmarked, but
>> >> copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>> >> Unmarking the corpse before the copy unmarks both.  This fixes a crash
>> >> with ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
>> >> creates lots of files, enough to push the system into the multi-pass regime.
>> >>
>> >>
>> >> Pharo urgently needs to upgrade the VM to one more up to date than 2017
>> >> 08 27 (in fact more up-to-date than opensmalltalk/vm commit
>> >> 0fe1e1ea108e53501a0e728736048062c83a66ce, Fri Jan 19 13:17:57 2018
>> >> -0800).  The bug that VMMaker.oscog-eem.2320 fixes can result in image
>> >> corruption in large images, and can occur (as it has here) at start-up,
>> >> causing one's work to be irretrievably lost.
>> >>
>> >
>> > Hi Eliot,
>> >
>> > I think that there is a lot of people who would like to get a newer
>> > stable vm for Pharo 6.1 and 7. The problem is that it is hard to know
>> > which VM are stable enough to be promoted as stable.
>> >
>> > Some weeks ago Esteban tried to promote a VM as stable and he had to
>> > revert it the same day because a regression occurred in the VM.
>> >
>> > If you're able to tell us which vms are stable in those present at
>> > http://files.pharo.org/vm/pharo-spur32/ <http://files.pharo.org/vm/pharo-spur32/> and
>> > http://files.pharo.org/vm/pharo-spur64/ <http://files.pharo.org/vm/pharo-spur64/> it would be a great help.
>> >
>> > Even better would be for the pharo community to have a way to know which
>> > vms are stable or not without having to ask you.
>> 
>> there is no “stable” branch in Cog, and that’s a problem.
>> “released” versions (the version you can find as stable) are not working for Pharo :(
>> 
>> I tried to promote versions from end feb and that crashed.
>> 
>> next week I will try again, maybe now they are stable enough… one thing is true: the versions that we consider stable (from oct/17) present problems that are already solved on latest.
>> 
>> Esteban
>> 
>> 
>> >
>> > Have a nice day.
>> >
>> >>
>> >> --
>> >> _,,,^..^,,,_
>> >> best, Eliot
>> > --
>> > Cyril Ferlicot
>> > https://ferlicot.fr <https://ferlicot.fr/>
>> >
>> 
>> 
>> Hi,
>> Several problems are mixed here, let's try and decouple:
>> - 1) there are ongoing development in the core of VM that may introduce some instability
>> - 2) there are ongoing development in some plugins also
>> - 3) there are infrastructure problems preventing to produce artifacts whatever the intrinsic stability of the VM
> 
> you are right in all points, but for me this is a problem of process. 
> 
> - we have no defined milestones so nobody knows if they can jump to help.
> - plugin development happens “by his own” and nobody knows what happens, why happens and how it happens.
> - infrastructure is not bad and a lot of efforts has been made to make it work. But code sources are scattered around the world and the only thing that reunites them is the hand of the one who generates the C sources.
> 
> IMHO, is this “disconnection” what causes most of the problems. 
> 
> cheers,
> Esteban
> 
> 
> Hi Esteban,
> I see no fatality, and github also provides tools for that.
> Look, there is the project page on github
> https://github.com/OpenSmalltalk/opensmalltalk-vm/projects/1 <https://github.com/OpenSmalltalk/opensmalltalk-vm/projects/1>
> 
> Maybe the Pharo team is willing to collaborate and take active parts in definition of milestones?
>  
>> 
>> For 1) development happens in VMMaker, and we have to be relying on experts. Today that is Eliot and Clement.
>> We all want 64bits VM, improved GC, improved become:, write barrier, ephemerons, threaded FFI calls and adaptive optimization.
>> Pharo is relying on these progress, they are vital.
>> IMO, we are reaching a good level of confidence, and I hope to see some VMMaker version blessed as stable pretty soon.
>> 
>> Instead of whining, the best we can do for reaching this state is help them by providing accurate bug reports and even better reproducible cases.
>> Thanks to all who are working in this direction.
>> 
>> For 2) we had a few problems, but again this is for improving important features (SSL...)
>> Much of the development happens in feature branches already.
>> But since we are targetting so many platforms, and don't have automated tests that scale yet, we still need beta testers.
>> We can discuss about the introduction of such beta features wrt release cycles, that will be a good thing.
>> Ideally we should tend toward continuous integration and have very short cycles, but we're not yet there.
>> 
>> For 3)  we had a lot of problems, like staled links, invalid credentials, evolution of the version of tools at automated build site, etc...
>> 
>> If we don't build the artifacts, then we can't even have a chance to test the stability of 1) and 2)
>> We have to understand that 3) is absolutely vital.
>> 
>> May I remind that for a very long period last year, the build were broken due to lack of work at Pharo side.
>> Fortunately, this has changed in 2018.
>> Fabio has been working REALLY hard to improve 3), and without the help of Esteban,I don't think he could have reached the holy green build status.
>> We will never thank them enough for that. This also shows that cooperation may pay.
>> 
>> But this is still very fragile.
>> If we want to make progress, we should ask why it is so.
>> We could analyze the regressions, and decide if the complexity is sustainable, or eventually drop some drag.
>> We are chasing many hares by building the VM for Newspeak/Pharo/Squeak i386/x86_64/ARM Spur/Stack/V3 Sista/lowcode linux/Macosx/Windows ...
>> If it happens that a fix vital for Pharo/Squeak does break Newspeak tests, then it slows down the progress...
>> Maybe we would want to decouple a bit more the problems there too (they may come from some image side weakness).
>> 
>> Last two years I've also observed some work exclusively done in the Pharo fork of the opensmalltalk VM.
>> This was counter productive. Work must be produced upstream, or it's wasted.
> 
> This happened just once or twice. And it was because people were ignorant of “joint" so they continued contributing as before (and people were pointed to right place when we had the opportunity).
> And I disagree this was counterproductive because I took the effort to merge the changes into osvm. This worked fine until I stopped to do that job, but well… just one PR got stalled there for months and Alistair integrated it recently. 
> 
> What *did happen* and I’m still not ready to let it go is a lot of the small changes that we presented to be rejected (or ignored) without further consideration. But well, let’s keep it positive and not enter to sterile discussions, I just think you are wrong with this argument.
> 
> No, it's important, we (opensmalltalk-vm team) can't let such bad feeling and frustration creep in.
> Every contribution counts, that does not mean that every PR will be accepted, but we owe an explanation if not.
> Some are accepted instantly, some are accepted after modification requests, some are rejected (I hope with some rationalization).
> What is problematic is that some were ignored for too long time, I regret the situation, but there is no deliberate intention to ignore them, just lack of manpower IMO.
> For example, the recent work of Alistair shows that there is no fatality here, it's just that someone has to do the hard work (kudos!).

I’m sorry for have a bitter feeling, but I’m going to give you an example so you understand why I’m saying this: I proposed the refactor of Alien package (which is obviously right and simple) at least three times in last three years (by different means). First time I’ve been told “no, we prefer it like that”. Second time I’ve been told “we need to think about that” (and no answer later). Last time I even didn’t receive a response. So well, when Torsten proposed it he first came to me. I told him “talk in vm-dev and good luck”. His proposal was accepted (thanks god). 
Several situations like this one made me think that I’m a second class citizen in this community. And you know what? I’m 46 years and I do not want to be treated as is I’m a child that can play with the toys someone let me or not. So yes, I’m sad and not very “into” this this days. I made a lot of efforts to came back for the de facto fork we had because I always pushed to work together. But I do not see the spirit of collaboration I hoped. 

> Also, the question is coupled with stability: having a red status does not help, for almost every PR that I accepted, I had to dig into travis console reports and compare to status of previous build in order to know if it was a regression, or just a long time failing case... This does not scale!

is worst that "does not scale".
again, let me put you an example: Imagine I want to work on FFI and I need to touch both an external file and VMMaker: I cannot do a PR because VMMaker is not there and VM building will be broken and I cannot push VMMaker because platform sources are not there and VM building will be broken too.
So, the only solution is what happens today: I need to push VMMaker and changes at the same time. So I’m forced to work on the development branch (we are all forced to work there), instead how I think it should be: each one should be able to work on their branch and contribute changes through PRs (PRs that can be validated with a good CI process).

As a consequence, since all contributions go to the development branch… we have no stability and we need to wait of the blessing of a VM. That may or may not happen.

Meh, whatever… is obvious that my way of work is different to most of the people of this community so I will go back to what I do now: just the absolutely necessary.

cheers, 
Esteban

ps: I will not continue discussing this… I know how things are and I’m sorry to put such a negative perspective in this list, but I needed to say it.

> 
> I'm all for more distributed power, and that should come with responsibilities, first a cooperative "you break it, you fix it" attitude.
> 
> Or maybe do you want clarified decision process?
> For now, people that feel interested by a PR raise their voice.
> I don't know if we need something more formal
> For important design decisions there is vm-dev mailing list to discuss about that.
> 
> cheers
> Nicolas
>  
> cheers,
> Esteban
> 
>> I once thought that the Pharo fork could be the place for the pharo team to manage official stable versions.
>> But I agree that this is too much duplicated work and would be very happy to see the work happen upstream too.
>> 
>> If you have constructive ideas that will help decoupling all these problems, we are all ear.
>> 
>> PS: i did not post this answer for avoiding sterile discussion, but since Phil asked...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180331/fe297f01/attachment-0001.html>


More information about the Vm-dev mailing list