[Vm-dev] [Pharo-dev] Random corrupted data when copying from very large byte array

Alistair Grant akgrant0710 at gmail.com
Sat Jan 20 08:19:04 UTC 2018


Hi Eliot,

On 19 January 2018 at 23:04, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> Hi Alistair, Hi Clément,
>
> On Fri, Jan 19, 2018 at 12:53 PM, Alistair Grant <akgrant0710 at gmail.com>
> wrote:
>>
>> Hi Clément,
>>
>> On 19 January 2018 at 17:21, Alistair Grant <akgrant0710 at gmail.com> wrote:
>> > Hi Clément,
>> >
>> > On 19 January 2018 at 17:04, Clément Bera <bera.clement at gmail.com>
>> > wrote:
>> >> Does not seem to be related to prim 105.
>> >>
>
>
> I suspect that the problem is the same compactor bug I've been trying to
> reproduce all week, and have just fixed.  Could you try and reproduce with a
> VM built from the latest commit?

Happy to, but I'm out all day today, so it will be tomorrow or Monday.

Cheers,
Alistair
(on the run...)






> Some details:
> The SpurPlanningCompactor works by using the fact that all Spur objects have
> room for a forwarding pointer.  The compactor make three passes:
>
> - the first pass through memory works out where objects will go, replacig
> their first fields with where they will go, and saving their first fields in
> a buffer (savedFirstFieldsSpace).
> - the second pass scans all pointer objects, replacing their fields with
> where the objects referenced will go (following the forwarding pointers),
> and also relocates any pointer fields in savedFirstFieldsSpace
> - the final pass slides objects down, restoring their relocated first fields
>
> The buffer used for savedFirstFieldsSpace determines how many passes are
> used.  The system will either use eden (which is empty when compaction
> occurs) or a large free chunk or allocate a new segment, depending on
> whatever yields the largest space.  So in the right circumstances eden will
> be used and more than one pass required.
>
> The bug was that when multiple passes are used the compactor forgot to
> unmark the corpse left behind when the object was moved.  Instead of the
> corpse being changed into free space it was retained, but its first field
> would be that of the forwarding pointer to its new location, not the actual
> first field.  So on 32-bits a ByteArray that should have been collected
> would have its first 4 bytes appear to be invalid, and on 64-bits its first
> 8 bytes.  Because the heap on 64-bits can grow larger it could be that the
> bug shows itself much less frequently than on 32-bits. When compaction can
> be completed in a single pass all corpses are correctly collected, so most
> of the time the bug is hidden.
>
> This is the commit:
> commit 0fe1e1ea108e53501a0e728736048062c83a66ce
> Author: Eliot Miranda <eliot.miranda at gmail.com>
> Date:   Fri Jan 19 13:17:57 2018 -0800
>
>     CogVM source as per VMMaker.oscog-eem.2320
>
>     Spur:
>     Fix a bad bug in SpurPlnningCompactor.
> unmarkObjectsFromFirstFreeObject,
>     used when the compactor requires more than one pass due to insufficient
>     savedFirstFieldsSpace, expects the corpse of a moved object to be
> unmarked,
>     but copyAndUnmarkObject:to:bytes:firstField: only unmarked the target.
>     Unmarking the corpse before the copy unmarks both.  This fixes a crash
> with
>     ReleaseBuilder class>>saveAsNewRelease when non-use of cacheDuring:
> creates
>     lots of files, enough to push the system into the multi-pass regime.
>
>>
>>
>> HTH,
>> Alistair
>>
>>
>>
>> > Cheers,
>> > Alistair
>> >
>> >
>> >
>> >> On Thu, Jan 18, 2018 at 7:12 PM, Clément Bera <bera.clement at gmail.com>
>> >> wrote:
>> >>>
>> >>> I would suspect a bug in primitive 105 on byte objects (it was changed
>> >>> recently in the VM), called by copyFrom: 1 to: readCount. The bug
>> >>> would
>> >>> likely by due to specific alignment in readCount or something like
>> >>> that.
>> >>> (Assuming you're in 32 bits since the 4 bytes are corrupted).
>> >>>
>> >>> When I get better I can have a look (I am currently quite sick).
>> >>>
>> >>> On Thu, Jan 18, 2018 at 4:51 PM, Thierry Goubier
>> >>> <thierry.goubier at gmail.com> wrote:
>> >>>>
>> >>>> Hi Cyril,
>> >>>>
>> >>>> try with the last vms available at:
>> >>>>
>> >>>> https://bintray.com/opensmalltalk/vm/cog/
>> >>>>
>> >>>> For example, the last Ubuntu 64bits vm is at:
>> >>>>
>> >>>> https://bintray.com/opensmalltalk/vm/cog/201801170946#files
>> >>>>
>> >>>> Regards,
>> >>>>
>> >>>> Thierry
>> >>>>
>> >>>> 2018-01-18 16:42 GMT+01:00 Cyrille Delaunay <cy.delaunay at gmail.com>:
>> >>>>>
>> >>>>> Hi everyone,
>> >>>>>
>> >>>>> I just added a new bug entry for an issue we are experimenting since
>> >>>>> some times:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> https://pharo.fogbugz.com/f/cases/20982/Random-corrupted-data-when-copying-from-very-large-byte-array
>> >>>>>
>> >>>>> Here is the description:
>> >>>>>
>> >>>>>
>> >>>>> History:
>> >>>>>
>> >>>>> This issue has been spotted after experimenting strange behavior
>> >>>>> with
>> >>>>> seaside upload.
>> >>>>> After uploading a big file from a web browser, the modeled file
>> >>>>> generated within pharo image begins with 4 unexpected bytes.
>> >>>>> This issue occurs randomly: sometimes the first 4 bytes are right.
>> >>>>> Sometimes the first 4 bytes are wrong.
>> >>>>> This issue only occurs with Pharo 6.
>> >>>>> This issue occurs for all platforms (Mac, Ubuntu, Windows)
>> >>>>>
>> >>>>> Steps to reproduce:
>> >>>>>
>> >>>>> I have been able to set up a small scenario that highlight the
>> >>>>> issue.
>> >>>>>
>> >>>>> Download Pharo 6.1 on my Mac (Sierra 10.12.6):
>> >>>>> https://pharo.org/web/download
>> >>>>> Then, iterate over this process till spotting the issue:
>> >>>>>
>> >>>>> => start the pharo image
>> >>>>> => execute this piece of code in a playground
>> >>>>>
>> >>>>> 1:
>> >>>>> 2:
>> >>>>> 3:
>> >>>>> 4:
>> >>>>> 5:
>> >>>>> 6:
>> >>>>>
>> >>>>> ZnServer startDefaultOn: 1701.
>> >>>>> ZnServer default maximumEntitySize: 80* 1024 * 1024.
>> >>>>> '/Users/cdelaunay/myzip.zip' asFileReference writeStreamDo: [ :out |
>> >>>>> out binary; nextPutAll: #[80 75 3 4 10 0 0 0 0 0 125 83 67 73 0 0 0
>> >>>>> 0 0
>> >>>>> 0].
>> >>>>> 18202065 timesRepeat: [ out nextPut: 0 ]
>> >>>>> ].
>> >>>>>
>> >>>>> => Open a web browser page on: http://localhost:1701/form-test-3
>> >>>>> => Upload the file zip file previously generated ('myzip.zip').
>> >>>>> => If the web page displays: "contents=000000000a00..." (or anything
>> >>>>> unexpected), THIS IS THE ISSUE !
>> >>>>> => If the web page displays: "contents=504b03040a00..", the upload
>> >>>>> worked fine. You can close the image (without saving)
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Debugging:
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> Bob Arning has been able to reproduce the issue with my scenario.
>> >>>>> He dived into the code involved during this process, till reaching
>> >>>>> some
>> >>>>> "basic" methods where he saw the issue occuring.
>> >>>>>
>> >>>>> Here are the conclusion till there:
>> >>>>> => A corruption occurs while reading an input stream with method
>> >>>>> ZnUtils
>> >>>>> class>>readUpToEnd:limit:
>> >>>>> The first 4 bytes may be altered randomely.
>> >>>>> => The first 4 bytes are initially correctly written to an
>> >>>>> outputStream.
>> >>>>> But, the first 4 bytes of this outputStream gets altered
>> >>>>> (corrupted),
>> >>>>> sometimes when the inner byte array grows OR when performing the
>> >>>>> final
>> >>>>> "outputStream contents"
>> >>>>> => Here is a piece of code that reproduce the issue (still
>> >>>>> randomely.
>> >>>>> stopping an restarting the image may change the behavior)
>> >>>>>
>> >>>>> 1:
>> >>>>> 2:
>> >>>>> 3:
>> >>>>> 4:
>> >>>>> 5:
>> >>>>> 6:
>> >>>>> 7:
>> >>>>> 8:
>> >>>>> 9:
>> >>>>> 10:
>> >>>>> 11:
>> >>>>> 12:
>> >>>>> 13:
>> >>>>> 14:
>> >>>>> 15:
>> >>>>> 16:
>> >>>>> 17:
>> >>>>> 18:
>> >>>>> 19:
>> >>>>> 20:
>> >>>>>
>> >>>>> test4"self test4"    | species bufferSize buffer totalRead
>> >>>>> outputStream
>> >>>>> answer inputStream ba byte1 |            ba := ByteArray new:
>> >>>>> 18202085.
>> >>>>> ba atAllPut: 99.        1 to: 20 do: [  :i | ba at: i put: (#[80 75
>> >>>>> 3 4 10 7
>> >>>>> 7 7 7 7 125 83 67 73 7 7 7 7 7 7] at: i) ].    inputStream := ba
>> >>>>> readStream.
>> >>>>> bufferSize := 16384.    species := ByteArray.
>> >>>>>     buffer := species new: bufferSize.
>> >>>>>     totalRead := 0.
>> >>>>>     outputStream := nil.
>> >>>>>     [ inputStream atEnd ] whileFalse: [ | readCount |
>> >>>>>         readCount := inputStream readInto: buffer startingAt: 1
>> >>>>> count:
>> >>>>> bufferSize.
>> >>>>>         totalRead = 0 ifTrue: [
>> >>>>>             byte1 := buffer first.
>> >>>>>         ].
>> >>>>>         totalRead := totalRead + readCount.
>> >>>>>
>> >>>>>         outputStream ifNil: [
>> >>>>>             inputStream atEnd
>> >>>>>                 ifTrue: [ ^ buffer copyFrom: 1 to: readCount ]
>> >>>>>                 ifFalse: [ outputStream := (species new: bufferSize)
>> >>>>> writeStream ] ].
>> >>>>>         outputStream next: readCount putAll: buffer startingAt: 1.
>> >>>>>         byte1 = outputStream contents first ifFalse: [ self halt ].
>> >>>>>     ].
>> >>>>>     answer := outputStream ifNil: [ species new ] ifNotNil: [
>> >>>>> outputStream contents ].
>> >>>>>     byte1 = answer first ifFalse: [ self halt ].    ^answer
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> suspicions
>> >>>>>
>> >>>>> This issue appeared with Pharo 6.
>> >>>>>
>> >>>>> Some people suggested that it could be a vm issue, and to try my
>> >>>>> little
>> >>>>> scenario with the last available vm.
>> >>>>>
>> >>>>> I am not sure where to find the last available vm.
>> >>>>>
>> >>>>> I did the test using these elements:
>> >>>>>
>> >>>>> https://files.pharo.org/image/60/latest.zip
>> >>>>>
>> >>>>> https://files.pharo.org/get-files/70/pharo-mac-latest.zip/
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> The issue is still present
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Cyrille Delaunay
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Clément Béra
>> >>> Pharo consortium engineer
>> >>> https://clementbera.wordpress.com/
>> >>> Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Clément Béra
>> >> Pharo consortium engineer
>> >> https://clementbera.wordpress.com/
>> >> Bâtiment B 40, avenue Halley 59650 Villeneuve d'Ascq
>>
>
>
>
> --
> _,,,^..^,,,_
> best, Eliot


More information about the Vm-dev mailing list