[squeak-dev] Why is source code always in files only?

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Jan 20 10:28:12 UTC 2015


2015-01-20 1:29 GMT+01:00 Eliot Miranda <eliot.miranda at gmail.com>:

>
>
> On Mon, Jan 19, 2015 at 1:31 PM, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> wrote:
>
>> Hi Tobias,
>> are you aware of CurrentReadOnlySourceFiles cacheDuring: [...]
>> This is to workaround the readOnlyCopy used for thread safety which is
>> the main killer of performance...
>>
>
> IMO this is a bug.  We should simply have a single read-only copy of each
> sources file and modify the debugger to either save and restore the state
> of a read-only copy around accessing source, or use its own read-only copy
> (except that the latter approach breaks when one debugs the debugger).  The
> difference in performance between using CurrentReadOnlySourceFiles
> cacheDuring: [...] and not in anything that accesses source is huge.  And
> CurrentReadOnlySourceFiles cacheDuring: [...] is a /lot/ of verbiage to
> type in doits, and a sign that something is wrong.
>

Yes it's smells... It's not our business, an encapsulation is missing.


>
>
>>
>> 2015-01-19 22:10 GMT+01:00 Tobias Pape <Das.Linux at gmx.de>:
>>
>>>
>>> On 19.01.2015, at 21:51, Levente Uzonyi <leves at elte.hu> wrote:
>>>
>>> > On Mon, 19 Jan 2015, Tobias Pape wrote:
>>> >
>>> >>
>>> >> On 19.01.2015, at 18:34, Chris Muller <asqueaker at gmail.com> wrote:
>>> >>
>>> >>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <Das.Linux at gmx.de>
>>> wrote:
>>> >>> Hi all,
>>> >>>
>>> >>>
>>> >>> We store method source _solely_ in files (.sources/.changes).
>>> >>> Why? We have means to attach it to Compiled methods, in fact, more
>>> than one:
>>> >>>
>>> >>>
>>> >>> CompiledMethod allInstances size. "57766."
>>> >>> CompiledMethod allInstances count: [:m | m properties includesKey:
>>> #source].  "0."
>>> >>> CompiledMethod allInstances count: [:m | m trailer sourceCode
>>> notNil]. "0."
>>> >>> CompiledMethod allInstances count: [:m | m trailer
>>> hasSourcePointer]. "57700."
>>> >>>
>>> >>>
>>> >>> " also interesting "
>>> >>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag)
>>> sortedCounts
>>> >>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress .
>>> 2->#TempsNamesZip}
>>> >>>
>>> >>>
>>> >>> When doing some analysis on source code, it is a pain to _either_
>>> >>> always go to disk for the source _or_ cache the code myself (which
>>> may
>>> >>> get out of sync sooon).
>>> >>>
>>> >>> If you're sending messages instead of viewing private innards, why
>>> is it a pain?
>>> >>
>>> >> What do you mean?
>>> >>
>>> >> Calling getSource on a CM goes 300km to disk instead of 1m to memory
>>> (metaphorically spoken)
>>> >> and when I do analysis on source code I typically do stuff like that
>>> a lot.
>>> >> And as developer I really dislike that I have to choose between either
>>> >>
>>> >> a) bad performance due to excessive IO (yes I want to access the
>>> source a lot)
>>> >> b) caching things myself when already two ways of storing them are
>>> available.
>>> >
>>> > On today's machines you don't have to. Once you read the data from the
>>> disk, it'll be cached in memory. It would be faster to access the sources,
>>> if they were stored in a trailer, but that would bump the image size by
>>> about 15 MB (uncompressed), or 9 MB (compressed):
>>> >
>>>
>>> I understand. But for a development image, I'd take that burden.
>>>
>>> > | size compressedSize |
>>> > size := compressedSize := 0.
>>> > CurrentReadOnlySourceFiles cacheDuring: [
>>> >       SystemNavigation default allSelectorsAndMethodsDo: [ :behavior
>>> :selector :method |
>>> >               | string compressed |
>>> >               string := method getSource asString.
>>> >               compressed := string squeakToUtf8 zipped.
>>> >               size := size + string byteSize + ((string size > 255)
>>> asBit + 1 * 4).
>>> >               compressedSize := compressedSize + compressed byteSize +
>>> ((compressed size > 255) asBit + 1 * 4) ] ].
>>> > { size. compressedSize }.
>>> >
>>> > "==> #(15003880 9057408)"
>>>
>>>
>>> What I am actually wondering about,
>>> there are two completely different ways to _access_ source stored in the
>>> image
>>> but no way to actually _store_ it there.
>>>
>>> >
>>> >>
>>> >>>
>>> >>>  Can't we just save the source code either via trailer or properties
>>> >>> on first access?
>>> >>>
>>> >>> -1.  Why do I want all of those String's in my image?
>>> >>
>>> >> To do stuff to them.
>>> >> Like, analysing how many dots are in them, or how often someone
>>> crafts a Symbol.
>>> >> Analysis stuff.
>>> >> Currently, I have a separate structure that holds onto the code once
>>> retrieved
>>> >> from disk. But once the method change (eg, recompilation) I have to
>>> first detect,
>>> >> that it happened, and second flush and refill this cache. I find this
>>> tiresome.
>>> >
>>> > Do you flush your cache selectively?
>>>
>>> No, I can't for reasons :)
>>>
>>> >
>>> > Scanning all source code for a given pattern takes less than a second
>>> (~800 ms) on my machine. What's your performance goal?
>>>
>>> I have ~15.000 Methods that I have to compare line by line against each
>>> other.
>>> Doing that by going to the filesystem just kills it.
>>>
>>>
>>> Best
>>>         -Tobias
>>
>> --
> best,
> Eliot
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20150120/8ed92863/attachment.htm


More information about the Squeak-dev mailing list