Hi All,
We have tried to produce the same (bit identically) image file after two consecutive snapshots. We start from a base image then, fileIn several files into it and finally, we just SmalltalkImage current snapshot: true andQuit: true. We need this to verify the image file generated by a third-party with a checksum by executing a script. After trying several ways to get it (even by scripting the fileIn process and the snapshot), we found that the image files have, beside the timestamp differences, thousands of other differences and sometimes the snapshots have also size differeces. We supose that this kind of issues may occur due to the GC activity. Are this issues from the way GC process is changing dynamically the memory bytes? There is a way to inhibite this activity? Attached are the scripts we use to produce the image files.
Many thanks in advance, Martin Troielli
Not to mention anything that records TimeStamps or clock values...
-----Original Message----- From: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] On Behalf Of Martin Troielli Sent: 30 July 2007 4:32 pm To: squeak-dev@lists.squeakfoundation.org Subject: How to generate identically image file after snapshots
Hi All,
We have tried to produce the same (bit identically) image file after two consecutive snapshots. We start from a base image then, fileIn several files into it and finally, we just SmalltalkImage current snapshot: true andQuit: true. We need this to verify the image file generated by a third-party with a checksum by executing a script. After trying several ways to get it (even by scripting the fileIn process and the snapshot), we found that the image files have, beside the timestamp differences, thousands of other differences and sometimes the snapshots have also size differeces. We supose that this kind of issues may occur due to the GC activity. Are this issues from the way GC process is changing dynamically the memory bytes? There is a way to inhibite this activity? Attached are the scripts we use to produce the image files.
Many thanks in advance, Martin Troielli
cp image.im twinbrother.im
;)
-Boris
Hi Martin,
there are a lot of objects (like, for example subinstances of ContextPart) allocated and deallocated on which you do not have much control.
One corner from which this could be started is to consider enumerating (in two sister .images) all the objects you want to deploy. If that fails to produce comparable objects (for any reason, for example if you cannot order/compare object identities other than by hash identity and the latter is assigned by the VM and not by you) then, hrm, it fails.
But if not then you could trace out all the objects you want (thereby disacrding all the unwanted) and the resulting (two sister) .image files then have the same contents byte by byte, because you fix the object's position in the files. I've done that with other images and non-Smalltalk interpreters.
Having said that, your project doesn't look to be easy.
/Klaus
On Mon, 30 Jul 2007 17:31:43 +0200, Martin wrote:
Hi All,
We have tried to produce the same (bit identically) image file after two consecutive snapshots. We start from a base image then, fileIn several files into it and finally, we just SmalltalkImage current snapshot: true andQuit: true. We need this to verify the image file generated by a third-party with a checksum by executing a script. After trying several ways to get it (even by scripting the fileIn process and the snapshot), we found that the image files have, beside the timestamp differences, thousands of other differences and sometimes the snapshots have also size differeces. We supose that this kind of issues may occur due to the GC activity. Are this issues from the way GC process is changing dynamically the memory bytes? There is a way to inhibite this activity? Attached are the scripts we use to produce the image files.
Many thanks in advance, Martin Troielli
Hi Klaus,
Thanks for the information. We have modified the VM in order to reduce the GC activity inhibiting it until the fileIn processes are done, but had no luck. The produced files were different with less differences. We think we have to follow an approach similar to yours. We thought to generate a serialized file with all the CompiledMethods we use, without change the base image, merging them only when squeak starts up. We hope that this process does not demand too much time, since we have also a lot of resources to bring up at that time :S
Regards, Martin
On Mon, 30 Jul 2007 13:22:08 -0300, Klaus D. Witzel klaus.witzel@cobss.com wrote:
Hi Martin,
there are a lot of objects (like, for example subinstances of ContextPart) allocated and deallocated on which you do not have much control.
One corner from which this could be started is to consider enumerating (in two sister .images) all the objects you want to deploy. If that fails to produce comparable objects (for any reason, for example if you cannot order/compare object identities other than by hash identity and the latter is assigned by the VM and not by you) then, hrm, it fails.
But if not then you could trace out all the objects you want (thereby disacrding all the unwanted) and the resulting (two sister) .image files then have the same contents byte by byte, because you fix the object's position in the files. I've done that with other images and non-Smalltalk interpreters.
Having said that, your project doesn't look to be easy.
/Klaus
On Mon, 30 Jul 2007 17:31:43 +0200, Martin wrote:
Hi All,
We have tried to produce the same (bit identically) image file after two consecutive snapshots. We start from a base image then, fileIn several files into it and finally, we just SmalltalkImage current snapshot: true andQuit: true. We need this to verify the image file generated by a third-party with a checksum by executing a script. After trying several ways to get it (even by scripting the fileIn process and the snapshot), we found that the image files have, beside the timestamp differences, thousands of other differences and sometimes the snapshots have also size differeces. We supose that this kind of issues may occur due to the GC activity. Are this issues from the way GC process is changing dynamically the memory bytes? There is a way to inhibite this activity? Attached are the scripts we use to produce the image files.
Many thanks in advance, Martin Troielli
On 7/30/07, Martin Troielli martin.troielli@gmail.com wrote:
Hi Klaus,
Thanks for the information. We have modified the VM in order to reduce the GC activity inhibiting it until the fileIn processes are done, but had no luck. The produced files were different with less differences. We think we have to follow an approach similar to yours. We thought to generate a serialized file with all the CompiledMethods we use, without change the base image, merging them only when squeak starts up. We hope that this process does not demand too much time, since we have also a lot of resources to bring up at that time :S
Maybe I'm alone in being unclear on this, but what is the root goal here? Maybe there's a simpler way to achieve it.
Avi
Well writing the image out, means doing a full GC, some cleanup, then we write out some header bytes and do bytesWritten = sqImageFileWrite(pointerForOop(memory), sizeof (unsigned char), imageBytes, f);
which depending on the platform is
#define sqImageFileWrite(ptr, sz, count, f) fwrite(ptr, sz, count, f) or sqInt sqImageFileWrite(void *ptr, size_t elementSize, size_t count, sqImageFile f) { if (f != 0) return fwrite(ptr,elementSize,count,f); return 0; } or size_t sqImageFileWrite(void *ptr, size_t sz, size_t count, sqImageFile h) { DWORD dwReallyWritten; WriteFile((HANDLE)(h-1), (LPVOID) ptr, count*sz, &dwReallyWritten, NULL); return (size_t) (dwReallyWritten / sz); }
So after we've shoved the entire oops memory space out to what ever the file handle points to we start running the VM which instantly changes the bytes in memory because objects are created/destroyed as as result of executing byte codes.
If you have some desire to make duplicate images look at
primitiveSnapshot
and consider cloning that to perform the writeImageFile() twice using different image names.
-- ======================================================================== === John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===
On Tuesday 31 July 2007 7:32 am, John M McIntosh wrote:
If you have some desire to make duplicate images look at
primitiveSnapshot
and consider cloning that to perform the writeImageFile() twice using different image names.
It would be simpler to copy image files after they are written. But I don't this the issue was to copy image files locally. The original poster wanted to update third-party images by shipping fileIns to a reference image instead of the whole image itself. The poser, then, is how to verify that the resulting image is same as intended.
I would simply use xdelta (see xdelta.org) for situations like this. E.g.
xdelta delta ref.image thirdparty.image thirdparty.xd and ship thirdparty.xd xdelta patch thirdparty.xd ref.image thirdparty.image
The downside is xdelta is a memory hungry utility. How big is the image? Regards .. Subbu
mmm, I wonder how well this would work since when you load an image we first figure out how big it is then allocate memory for it, load it, then swizzle all the memory references by +/- an offset which is calculated base on the offset used when the image was saved, versus the offset given by the memory location allocated.
Now some operating system might give you the same virtual memory address when you use the same VM on the same operating system. In this case we don't have to swizzle the references. Currrent (I believe), certainly past versions of OSX would do this.
However in cases where the operating system does not give the same memory address, and I'll note the operating system might give you a random address each time on purpose for security reasons, why all the memory references become different at swizzle tie. Of course if this is the case, then on your next save, all your memory reference values will be different than the last save. Needless to say this would greatly affect how xdelta thinks your images are the save/different.
On Jul 30, 2007, at 9:49 PM, subbukk wrote:
On Tuesday 31 July 2007 7:32 am, John M McIntosh wrote:
If you have some desire to make duplicate images look at
primitiveSnapshot
and consider cloning that to perform the writeImageFile() twice using different image names.
It would be simpler to copy image files after they are written. But I don't this the issue was to copy image files locally. The original poster wanted to update third-party images by shipping fileIns to a reference image instead of the whole image itself. The poser, then, is how to verify that the resulting image is same as intended.
I would simply use xdelta (see xdelta.org) for situations like this. E.g.
xdelta delta ref.image thirdparty.image thirdparty.xd and ship thirdparty.xd xdelta patch thirdparty.xd ref.image thirdparty.image
The downside is xdelta is a memory hungry utility. How big is the image? Regards .. Subbu
-- ======================================================================== === John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===
Hi Avi,
The main goal is to certify a software development. The certifier must to check that a set of source files produce a binary output.
We give them: 1 - The final image and VM 2 - The Smalltalk source files (fileOuts of our development) 3 - The VM C source files 4 - The base image 5 - A make script that compiles the VM, filesIn the smalltalk source files on the base image and produces a final images and VM.
They need to check that the two images, the one we give (1) and the generated by our script (5) are the same. They check the differences by doing a binary diff plus a hash over the files. They only could allow changes refered to timestamps. They don't know anything about Smalltalk...
Best regards, Martin
On Mon, 30 Jul 2007 18:41:03 -0300, Avi Bryant avi@dabbledb.com wrote:
On 7/30/07, Martin Troielli martin.troielli@gmail.com wrote:
Hi Klaus,
Thanks for the information. We have modified the VM in order to reduce the GC activity inhibiting it until the fileIn processes are done, but had no luck. The produced files were different with less differences. We think we have to follow an approach similar to yours. We thought to generate a serialized file with all the CompiledMethods we use, without change the base image, merging them only when squeak starts up. We hope that this process does not demand too much time, since we have also a lot of resources to bring up at that time :S
Maybe I'm alone in being unclear on this, but what is the root goal here? Maybe there's a simpler way to achieve it.
Avi
On Jul 31, 2007, at 6:52 AM, Martin Troielli wrote:
Hi Avi,
The main goal is to certify a software development. The certifier must to check that a set of source files produce a binary output.
We give them: 1 - The final image and VM 2 - The Smalltalk source files (fileOuts of our development) 3 - The VM C source files 4 - The base image 5 - A make script that compiles the VM, filesIn the smalltalk source files on the base image and produces a final images and VM.
They need to check that the two images, the one we give (1) and the generated by our script (5) are the same. They check the differences by doing a binary diff plus a hash over the files. They only could allow changes refered to timestamps. They don't know anything about Smalltalk...
Best regards, Martin
12 years back I had a client like this. Let's see if I remember...
you could try doing
| m | m := OrderedCollection new. SystemNavigation default allBehaviorsDo: [ :behavior | behavior selectors do: [ :sel | decompiled := Decompiler new decompile: sel in: behavior. m add: decompiled]]. ^m
where you sort the behaviors by the class name, then sort the selectors and instead of collecting the decompiled value you stream the print string out to a stream. This should give you all the source code for the image in a sorted order which you then can compare as text files.
Think of it as decompiling the binary to see if the assembly instructions are the same.
What's missing is the globals and class variable values, but you might not need those... ?
Perhaps even a file out of all the methods in the image after the build might help?
-- ======================================================================== === John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===
Hi John,
Yes, I think that way we'll show that from different image files the extracted "source code" is exactly the same. I hope the certifiers could understand this...
Thanks anyway, Martin
On Wed, 01 Aug 2007 05:15:37 -0300, John M McIntosh johnmci@smalltalkconsulting.com wrote:
On Jul 31, 2007, at 6:52 AM, Martin Troielli wrote:
Hi Avi,
The main goal is to certify a software development. The certifier must to check that a set of source files produce a binary output.
We give them: 1 - The final image and VM 2 - The Smalltalk source files (fileOuts of our development) 3 - The VM C source files 4 - The base image 5 - A make script that compiles the VM, filesIn the smalltalk source files on the base image and produces a final images and VM.
They need to check that the two images, the one we give (1) and the generated by our script (5) are the same. They check the differences by doing a binary diff plus a hash over the files. They only could allow changes refered to timestamps. They don't know anything about Smalltalk...
Best regards, Martin
12 years back I had a client like this. Let's see if I remember...
you could try doing
| m | m := OrderedCollection new. SystemNavigation default allBehaviorsDo: [ :behavior | behavior selectors do: [ :sel | decompiled := Decompiler new decompile: sel in: behavior. m add: decompiled]]. ^m
where you sort the behaviors by the class name, then sort the selectors and instead of collecting the decompiled value you stream the print string out to a stream. This should give you all the source code for the image in a sorted order which you then can compare as text files.
Think of it as decompiling the binary to see if the assembly instructions are the same.
What's missing is the globals and class variable values, but you might not need those... ?
Perhaps even a file out of all the methods in the image after the build might help?
--
=== John M. McIntosh johnmci@smalltalkconsulting.com Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== ===
You have to cp the image once the engine interacts with it it is never the "same" lots of objects get created and destroyed in image start up the garbage collector runs anything using a clock runs. So if it is a script it has to be a shell script. Any deployment specific stuff should go in some config or text file that is read on start up.
Sean
-----Original Message----- From: squeak-dev-bounces@lists.squeakfoundation.org [mailto:squeak-dev-bounces@lists.squeakfoundation.org] On Behalf Of Martin Troielli Sent: Monday, July 30, 2007 8:32 AM To: squeak-dev@lists.squeakfoundation.org Subject: How to generate identically image file after snapshots
Hi All,
We have tried to produce the same (bit identically) image file after two consecutive snapshots. We start from a base image then, fileIn several files into it and finally, we just SmalltalkImage current snapshot: true andQuit: true. We need this to verify the image file generated by a third-party with a checksum by executing a script. After trying several ways to get it (even by scripting the fileIn process and the snapshot), we found that the image files have, beside the timestamp differences, thousands of other differences and sometimes the snapshots have also size differeces. We supose that this kind of issues may occur due to the GC activity. Are this issues from the way GC process is changing dynamically the memory bytes? There is a way to inhibite this activity? Attached are the scripts we use to produce the image files.
Many thanks in advance, Martin Troielli
squeak-dev@lists.squeakfoundation.org