[squeak-dev] Why is source code always in files only?

Tue Jan 20 00:29:31 UTC 2015

On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

> Hi Tobias,
> are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
> This is to workaround the readOnlyCopy used for thread safety which is the
> main killer of performance...
>

IMO this is a bug.  We should simply have a single read-only copy of each
sources file and modify the debugger to either save and restore the state
of a read-only copy around accessing source, or use its own read-only copy
(except that the latter approach breaks when one debugs the debugger).  The
difference in performance between using CurrentReadOnlySourceFiles
cacheDuring: [...] and not in anything that accesses source is huge.  And
CurrentReadOnlySourceFiles cacheDuring: [...] is a /lot/ of verbiage to
type in doits, and a sign that something is wrong.

>
> 2015-01-19 22:10 GMT+01:00 Tobias Pape <Das.Linux at gmx.de>:
>
>>
>> On 19.01.2015, at 21:51, Levente Uzonyi <leves at elte.hu> wrote:
>>
>> > On Mon, 19 Jan 2015, Tobias Pape wrote:
>> >
>> >>
>> >> On 19.01.2015, at 18:34, Chris Muller <asqueaker at gmail.com> wrote:
>> >>
>> >>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <Das.Linux at gmx.de>
>> wrote:
>> >>> Hi all,
>> >>>
>> >>>
>> >>> We store method source _solely_ in files (.sources/.changes).
>> >>> Why? We have means to attach it to Compiled methods, in fact, more
>> than one:
>> >>>
>> >>>
>> >>> CompiledMethod allInstances size. "57766."
>> >>> CompiledMethod allInstances count: [:m | m properties includesKey:
>> #source].  "0."
>> >>> CompiledMethod allInstances count: [:m | m trailer sourceCode
>> notNil]. "0."
>> >>> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer].
>> "57700."
>> >>>
>> >>>
>> >>> " also interesting "
>> >>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag)
>> sortedCounts
>> >>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress .
>> 2->#TempsNamesZip}
>> >>>
>> >>>
>> >>> When doing some analysis on source code, it is a pain to _either_
>> >>> always go to disk for the source _or_ cache the code myself (which may
>> >>> get out of sync sooon).
>> >>>
>> >>> If you're sending messages instead of viewing private innards, why is
>> it a pain?
>> >>
>> >> What do you mean?
>> >>
>> >> Calling getSource on a CM goes 300km to disk instead of 1m to memory
>> (metaphorically spoken)
>> >> and when I do analysis on source code I typically do stuff like that a
>> lot.
>> >> And as developer I really dislike that I have to choose between either
>> >>
>> >> a) bad performance due to excessive IO (yes I want to access the
>> source a lot)
>> >> b) caching things myself when already two ways of storing them are
>> available.
>> >
>> > On today's machines you don't have to. Once you read the data from the
>> disk, it'll be cached in memory. It would be faster to access the sources,
>> if they were stored in a trailer, but that would bump the image size by
>> about 15 MB (uncompressed), or 9 MB (compressed):
>> >
>>
>> I understand. But for a development image, I'd take that burden.
>>
>> > | size compressedSize |
>> > size := compressedSize := 0.
>> > CurrentReadOnlySourceFiles cacheDuring: [
>> >       SystemNavigation default allSelectorsAndMethodsDo: [ :behavior
>> :selector :method |
>> >               | string compressed |
>> >               string := method getSource asString.
>> >               compressed := string squeakToUtf8 zipped.
>> >               size := size + string byteSize + ((string size > 255)
>> asBit + 1 * 4).
>> >               compressedSize := compressedSize + compressed byteSize +
>> ((compressed size > 255) asBit + 1 * 4) ] ].
>> > { size. compressedSize }.
>> >
>> > "==> #(15003880 9057408)"
>>
>>
>> What I am actually wondering about,
>> there are two completely different ways to _access_ source stored in the
>> image
>> but no way to actually _store_ it there.
>>
>> >
>> >>
>> >>>
>> >>>  Can't we just save the source code either via trailer or properties
>> >>> on first access?
>> >>>
>> >>> -1.  Why do I want all of those String's in my image?
>> >>
>> >> To do stuff to them.
>> >> Like, analysing how many dots are in them, or how often someone crafts
>> a Symbol.
>> >> Analysis stuff.
>> >> Currently, I have a separate structure that holds onto the code once
>> retrieved
>> >> from disk. But once the method change (eg, recompilation) I have to
>> first detect,
>> >> that it happened, and second flush and refill this cache. I find this
>> tiresome.
>> >
>> > Do you flush your cache selectively?
>>
>> No, I can't for reasons :)
>>
>> >
>> > Scanning all source code for a given pattern takes less than a second
>> (~800 ms) on my machine. What's your performance goal?
>>
>> I have ~15.000 Methods that I have to compare line by line against each
>> other.
>> Doing that by going to the filesystem just kills it.
>>
>>
>> Best
>>         -Tobias
>
> --
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20150119/43f5fca8/attachment.htm