On 17.08.2009, at 05:34, K. K. Subramaniam wrote:
On Monday 17 Aug 2009 7:50:35 am Randal L. Schwartz wrote:
K> On Monday 17 Aug 2009 12:17:25 am Randal L. Schwartz wrote:
http://lists.squeakfoundation.org/pipermail/squeak-dev/2007-May/116683.html
That explanation is about .sources and .changes, not fileOuts.
K> The class fileout (from browser into *.st files) uses the same format K> (sequence of data chunks) as *.sources and *.changes files. Did you have K> some other fileOuts in mind?
Are you sure?
Yes. fileOuts are binary streams, not ASCII (cf. Class>>fileOut and FileStream>> writeSourceCodeFrom:.....).
Err, it's still text though.
It looks like the fileout I just got is classic ST-80 format, with "!" delimiting Smalltalk code. There's no "binary" data in here... it's all human-readable text (chunks of smalltalk code). That's different from the .sources and .changes, because they have some binary data in them (I thought).
I made the same mistake a few years back. Just because a byteArray contains readable text does not mean that it is becomes a string. Text is not portable across platforms because of different line-ending conventions. If line conversions are done on fileOut before filing in then any string literals with newlines will get corrupted.
You guys are confusing a couple of issues.
One is that the *.sources file and *.changes file is not simply a text file, because the image stores offsets into those files to find source code for a specific method. Hence you must not manually alter these files as you normally would edit source code, because that might invalidate the offsets - that's what I meant with "a database of text chunks" in the message linked above.
File-outs (*.st) and change-sets (*.cs) are a different matter. Here, no file offsets are stored anywhere so it is perfectly okay to edit them manually as text files.
The "binary data" I was talking about does not appear in regular file- outs or change-sets. But the file-in process is so flexible that it can even be used to read binary data. That's because filing-in actually executes code found in the file. The first part of the file can define a reader that reads the later part of the file.
This makes it actually an "object-oriented" file format: the file itself is an object that defines how it is to be read. (According to Alan this idea goes back to the 1950's B5000 tapes.)
But it also makes it impossible to know in advance what kind of data might be included when filing in something. So one must not do automatic transformations of the file contents, which might break that data.
However, as I wrote, if the file is indeed named *.st or *.cs, then no such arbitrary data is expected. These files contain only Smalltalk source code - it's not enforced, but it would be very atypical if they contained something else.
- Bert -