[squeak-dev] [Vm-dev] Fwd: [Pharo-dev] Random corrupted data when copying from very large byte array

Clément Bera bera.clement at gmail.com
Fri Jan 19 23:19:10 UTC 2018


Oh right ! I've just seen your message Eliot.

Yes it is very likely to be the compactor bug.

On Fri, Jan 19, 2018 at 11:05 PM, Eliot Miranda <eliot.miranda at gmail.com>
wrote:

>
>
> ---------- Forwarded message ----------
> From: Eliot Miranda <eliot.miranda at gmail.com>
> Date: Fri, Jan 19, 2018 at 2:04 PM
> Subject: Re: [Pharo-dev] Random corrupted data when copying from very
> large byte array
> To: Pharo Development List <pharo-dev at lists.pharo.org>
>
>
> Hi Alistair, Hi Clément,
>
> On Fri, Jan 19, 2018 at 12:53 PM, Alistair Grant <akgrant0710 at gmail.com>
> wrote:
>
>> Hi Clément,
>>
>> On 19 January 2018 at 17:21, Alistair Grant <akgrant0710 at gmail.com>
>> wrote:
>> > Hi Clément,
>> >
>> > On 19 January 2018 at 17:04, Clément Bera <bera.clement at gmail.com>
>> wrote:
>> >> Does not seem to be related to prim 105.
>> >>
>> >> I am confused. Has the size of the array an impact at all ?
>> >
>> > Yes, I tried reducing the size of the array by a factor of 10 and
>> > wasn't able to reproduce the problem at all.
>> >
>> > With the full size array it failed over half the time (32 bit).
>> >
>> > I ran the test about 180 times on 64 bit and didn't get a single
>> failure.
>> >
>> >> It seems the
>> >> problem shows since the first copy of 16k elements.
>> >>
>> >> I can't really reproduce the bug - it happened once but I cannot do it
>> >> again. Does the bug happen with the StackVM/PharoS VM you can find
>> here the
>> >> 32 bits versions : http://files.pharo.org/vm/pharoS-spur32/ ? The
>> >> StackVM/PharoS VM is the VM without the JIT, it may be since the bug is
>> >> unreliable that it happens only in jitted code, so trying that out may
>> be
>> >> worth it.
>> >
>> > I'll try and have a look at this over the weekend.
>>
>> This didn't fail once in 55 runs.
>>
>> OS: Ubuntu 16.04
>> Image: Pharo 6.0   Latest update: #60528
>> VM:
>> 5.0 #1 Wed Oct 12 15:48:53 CEST 2016 gcc 4.6.3 [Production Spur ITHB VM]
>> StackInterpreter VMMaker.oscog-EstebanLorenzano.1881 uuid:
>> ed616067-a57c-409b-bfb6-dab51f058235 Oct 12 2016
>> https://github.com/pharo-project/pharo-vm.git Commit:
>> 01a03276a2e2b243cd4a7d3427ba541f835c07d3 Date: 2016-10-12 14:31:09
>> +0200 By: Esteban Lorenzano <estebanlm at gmail.com> Jenkins build #606
>> Linux pharo-linux 3.2.0-31-generic-pae #50-Ubuntu SMP Fri Sep 7
>> 16:39:45 UTC 2012 i686 i686 i386 GNU/Linux
>> plugin path: /home/alistair/pharo7/Issue20982/bin/ [default:
>> /home/alistair/pharo7/Issue20982/bin/]
>>
>>
>> I then went back and attempted to reproduce the failures in my regular
>> 32 bit image, but only got 1 corruption in 10 runs.  I've been working
>> in this image without restarting for most of the day.
>>
>> Quitting out and restarting the image and then running the corruption
>> check resulted in 11 corruptions from 11 runs.
>>
>>
>> Image: Pharo 7.0 Build information:
>> Pharo-7.0+alpha.build.425.sha.eb0a6fb140ac4a42b1f158ed37717e0650f778b4
>> (32 Bit)
>> VM:
>> 5.0-201801110739  Thursday 11 January  09:30:12 CET 2018 gcc 4.8.5
>> [Production Spur VM]
>> CoInterpreter VMMaker.oscog-eem.2302 uuid:
>> 55ec8f63-cdbe-4e79-8f22-48fdea585b88 Jan 11 2018
>> StackToRegisterMappingCogit VMMaker.oscog-eem.2302 uuid:
>> 55ec8f63-cdbe-4e79-8f22-48fdea585b88 Jan 11 2018
>> VM: 201801110739
>> alistair at alistair-xps13:snap/pharo-snap/pharo-vm/opensmalltalk-vm $
>> Date: Wed Jan 10 23:39:30 2018 -0800 $
>> Plugins: 201801110739
>> alistair at alistair-xps13:snap/pharo-snap/pharo-vm/opensmalltalk-vm $
>> Linux b07d7880072c 4.13.0-26-generic #29~16.04.2-Ubuntu SMP Tue Jan 9
>> 22:00:44 UTC 2018 i686 i686 i686 GNU/Linux
>> plugin path: /snap/core/3748/lib/i386-linux-gnu/ [default:
>> /snap/core/3748/lib/i386-linux-gnu/]
>>
>>
>> So, as well as restarting the image before running the test, just
>> wondering if the gcc compiler version could have an impact?
>>
>
> I suspect that the problem is the same compactor bug I've been trying to
> reproduce all week, and have just fixed.  Could you try and reproduce with
> a VM built from the latest commit?
>
> Some details:
> The SpurPlanningCompactor works by using the fact that all Spur objects
> have room for a forwarding pointer.  The compactor make three passes:
>
> - the first pass through memory works out where objects will go, replacig
> their first fields with where they will go, and saving their first fields
> in a buffer (savedFirstFieldsSpace).
> - the second pass scans all pointer objects, replacing their fields with
> where the objects referenced will go (following the forwarding pointers),
> and also relocates any pointer fields in savedFirstFieldsSpace
> - the final pass slides objects down, restoring their relocated first
> fields
>
> The buffer used for savedFirstFieldsSpace determines how many passes are
> used.  The system will either use eden (which is empty when compaction
> occurs) or a large free chunk or allocate a new segment, depending on
> whatever yields the largest space.  So in the right circumstances eden will
> be used and more than one pass required.
>
> The bug was that when multiple passes are used the compactor forgot to
> unmark the corpse left behind when the object was moved.  Instead of the
> corpse being changed into free space it was retained, but its first field
> would be that of the forwarding pointer to its new location, not the actual
> first field.  So on 32-bits a ByteArray that should have been collected
> would have its first 4 bytes appear to be invalid, and on 64-bits its first
> 8 bytes.  Because the heap on 64-bits can grow larger it could be that the
> bug shows itself much less frequently than on 32-bits. When compaction can
> be completed in a single pass all corpses are correctly collected, so most
> of the time the bug is hidden.
>
> This is the commit:
> commit 0fe1e1ea108e53501a0e728736048062c83a66ce
> Author: Eliot Miranda <eliot.miranda at gmail.com>
> Date:   Fri Jan 19 13:17:57 2018 -0800
>
>     CogVM source as per VMMaker.oscog-eem.2320
>
>     Spur:
>     Fix a bad bug in SpurPlnningCompactor.  unmarkObjectsFromFirstFreeObj
> ect,
>     used when the compactor requires more than one pass due to insufficient
>     savedFirstFieldsSpace, expects the corpse of a moved object to be
> unmarked,
>     but copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>     Unmarking the corpse before the copy unmarks both.  This fixes a crash
> with
>     ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
> creates
>     lots of files, enough to push the system into the multi-pass regime.
>
>
>>
>> HTH,
>> Alistair
>>
>>
>>
>> > Cheers,
>> > Alistair
>> >
>> >
>> >
>> >> On Thu, Jan 18, 2018 at 7:12 PM, Clément Bera <bera.clement at gmail.com>
>> >> wrote:
>> >>>
>> >>> I would suspect a bug in primitive 105 on byte objects (it was changed
>> >>> recently in the VM), called by copyFrom: 1 to: readCount. The bug
>> would
>> >>> likely by due to specific alignment in readCount or something like
>> that.
>> >>> (Assuming you're in 32 bits since the 4 bytes are corrupted).
>> >>>
>> >>> When I get better I can have a look (I am currently quite sick).
>> >>>
>> >>> On Thu, Jan 18, 2018 at 4:51 PM, Thierry Goubier
>> >>> <thierry.goubier at gmail.com> wrote:
>> >>>>
>> >>>> Hi Cyril,
>> >>>>
>> >>>> try with the last vms available at:
>> >>>>
>> >>>> https://bintray.com/opensmalltalk/vm/cog/
>> >>>>
>> >>>> For example, the last Ubuntu 64bits vm is at:
>> >>>>
>> >>>> https://bintray.com/opensmalltalk/vm/cog/201801170946#files
>> >>>>
>> >>>> Regards,
>> >>>>
>> >>>> Thierry
>> >>>>
>> >>>> 2018-01-18 16:42 GMT+01:00 Cyrille Delaunay <cy.delaunay at gmail.com>:
>> >>>>>
>> >>>>> Hi everyone,
>> >>>>>
>> >>>>> I just added a new bug entry for an issue we are experimenting since
>> >>>>> some times:
>> >>>>>
>> >>>>>
>> >>>>> https://pharo.fogbugz.com/f/cases/20982/Random-corrupted-dat
>> a-when-copying-from-very-large-byte-array
>> >>>>>
>> >>>>> Here is the description:
>> >>>>>
>> >>>>>
>> >>>>> History:
>> >>>>>
>> >>>>> This issue has been spotted after experimenting strange behavior
>> with
>> >>>>> seaside upload.
>> >>>>> After uploading a big file from a web browser, the modeled file
>> >>>>> generated within pharo image begins with 4 unexpected bytes.
>> >>>>> This issue occurs randomly: sometimes the first 4 bytes are right.
>> >>>>> Sometimes the first 4 bytes are wrong.
>> >>>>> This issue only occurs with Pharo 6.
>> >>>>> This issue occurs for all platforms (Mac, Ubuntu, Windows)
>> >>>>>
>> >>>>> Steps to reproduce:
>> >>>>>
>> >>>>> I have been able to set up a small scenario that highlight the
>> issue.
>> >>>>>
>> >>>>> Download Pharo 6.1 on my Mac (Sierra 10.12.6):
>> >>>>> https://pharo.org/web/download
>> >>>>> Then, iterate over this process till spotting the issue:
>> >>>>>
>> >>>>> => start the pharo image
>> >>>>> => execute this piece of code in a playground
>> >>>>>
>> >>>>> 1:
>> >>>>> 2:
>> >>>>> 3:
>> >>>>> 4:
>> >>>>> 5:
>> >>>>> 6:
>> >>>>>
>> >>>>> ZnServer startDefaultOn: 1701.
>> >>>>> ZnServer default maximumEntitySize: 80* 1024 * 1024.
>> >>>>> '/Users/cdelaunay/myzip.zip' asFileReference writeStreamDo: [ :out |
>> >>>>> out binary; nextPutAll: #[80 75 3 4 10 0 0 0 0 0 125 83 67 73 0 0 0
>> 0 0
>> >>>>> 0].
>> >>>>> 18202065 timesRepeat: [ out nextPut: 0 ]
>> >>>>> ].
>> >>>>>
>> >>>>> => Open a web browser page on: http://localhost:1701/form-test-3
>> >>>>> => Upload the file zip file previously generated ('myzip.zip').
>> >>>>> => If the web page displays: "contents=000000000a00..." (or anything
>> >>>>> unexpected), THIS IS THE ISSUE !
>> >>>>> => If the web page displays: "contents=504b03040a00..", the upload
>> >>>>> worked fine. You can close the image (without saving)
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Debugging:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Bob Arning has been able to reproduce the issue with my scenario.
>> >>>>> He dived into the code involved during this process, till reaching
>> some
>> >>>>> "basic" methods where he saw the issue occuring.
>> >>>>>
>> >>>>> Here are the conclusion till there:
>> >>>>> => A corruption occurs while reading an input stream with method
>> ZnUtils
>> >>>>> class>>readUpToEnd:limit:
>> >>>>> The first 4 bytes may be altered randomely.
>> >>>>> => The first 4 bytes are initially correctly written to an
>> outputStream.
>> >>>>> But, the first 4 bytes of this outputStream gets altered
>> (corrupted),
>> >>>>> sometimes when the inner byte array grows OR when performing the
>> final
>> >>>>> "outputStream contents"
>> >>>>> => Here is a piece of code that reproduce the issue (still
>> randomely.
>> >>>>> stopping an restarting the image may change the behavior)
>> >>>>>
>> >>>>> 1:
>> >>>>> 2:
>> >>>>> 3:
>> >>>>> 4:
>> >>>>> 5:
>> >>>>> 6:
>> >>>>> 7:
>> >>>>> 8:
>> >>>>> 9:
>> >>>>> 10:
>> >>>>> 11:
>> >>>>> 12:
>> >>>>> 13:
>> >>>>> 14:
>> >>>>> 15:
>> >>>>> 16:
>> >>>>> 17:
>> >>>>> 18:
>> >>>>> 19:
>> >>>>> 20:
>> >>>>>
>> >>>>> test4"self test4"    | species bufferSize buffer totalRead
>> outputStream
>> >>>>> answer inputStream ba byte1 |            ba := ByteArray new:
>> 18202085.
>> >>>>> ba atAllPut: 99.        1 to: 20 do: [  :i | ba at: i put: (#[80 75
>> 3 4 10 7
>> >>>>> 7 7 7 7 125 83 67 73 7 7 7 7 7 7] at: i) ].    inputStream := ba
>> readStream.
>> >>>>> bufferSize := 16384.    species := ByteArray.
>> >>>>>     buffer := species new: bufferSize.
>> >>>>>     totalRead := 0.
>> >>>>>     outputStream := nil.
>> >>>>>     [ inputStream atEnd ] whileFalse: [ | readCount |
>> >>>>>         readCount := inputStream readInto: buffer startingAt: 1
>> count:
>> >>>>> bufferSize.
>> >>>>>         totalRead = 0 ifTrue: [
>> >>>>>             byte1 := buffer first.
>> >>>>>         ].
>> >>>>>         totalRead := totalRead + readCount.
>> >>>>>
>> >>>>>         outputStream ifNil: [
>> >>>>>             inputStream atEnd
>> >>>>>                 ifTrue: [ ^ buffer copyFrom: 1 to: readCount ]
>> >>>>>                 ifFalse: [ outputStream := (species new: bufferSize)
>> >>>>> writeStream ] ].
>> >>>>>         outputStream next: readCount putAll: buffer startingAt: 1.
>> >>>>>         byte1 = outputStream contents first ifFalse: [ self halt ].
>> >>>>>     ].
>> >>>>>     answer := outputStream ifNil: [ species new ] ifNotNil: [
>> >>>>> outputStream contents ].
>> >>>>>     byte1 = answer first ifFalse: [ self halt ].    ^answer
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> suspicions
>> >>>>>
>> >>>>> This issue appeared with Pharo 6.
>> >>>>>
>> >>>>> Some people suggested that it could be a vm issue, and to try my
>> little
>> >>>>> scenario with the last available vm.
>> >>>>>
>> >>>>> I am not sure where to find the last available vm.
>> >>>>>
>> >>>>> I did the test using these elements:
>> >>>>>
>> >>>>> https://files.pharo.org/image/60/latest.zip
>> >>>>>
>> >>>>> https://files.pharo.org/get-files/70/pharo-mac-latest.zip/
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> The issue is still present
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Cyrille Delaunay
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Clément Béra
>> >>> Pharo consortium engineer
>> >>> https://clementbera.wordpress.com/
>> >>> Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Clément Béra
>> >> Pharo consortium engineer
>> >> https://clementbera.wordpress.com/
>> >> Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
>>
>>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
>


-- 
Clément Béra
Pharo consortium engineer
https://clementbera.wordpress.com/
Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20180120/61b5ffde/attachment.html>


More information about the Squeak-dev mailing list