[squeak-dev] Why is source code always in files only?

Levente Uzonyi leves at elte.hu
Mon Jan 19 21:56:06 UTC 2015


On Mon, 19 Jan 2015, Tobias Pape wrote:

>
> On 19.01.2015, at 21:51, Levente Uzonyi <leves at elte.hu> wrote:
>
>> On Mon, 19 Jan 2015, Tobias Pape wrote:
>>
>>>
>>> On 19.01.2015, at 18:34, Chris Muller <asqueaker at gmail.com> wrote:
>>>
>>>> On Mon, Jan 19, 2015 at 6:45 AM, Tobias Pape <Das.Linux at gmx.de> wrote:
>>>> Hi all,
>>>>
>>>>
>>>> We store method source _solely_ in files (.sources/.changes).
>>>> Why? We have means to attach it to Compiled methods, in fact, more than one:
>>>>
>>>>
>>>> CompiledMethod allInstances size. "57766."
>>>> CompiledMethod allInstances count: [:m | m properties includesKey: #source].  "0."
>>>> CompiledMethod allInstances count: [:m | m trailer sourceCode notNil]. "0."
>>>> CompiledMethod allInstances count: [:m | m trailer hasSourcePointer]. "57700."
>>>>
>>>>
>>>> " also interesting "
>>>> (CompiledMethod allInstances collect: [:m | m trailer kind] as: Bag) sortedCounts
>>>> {57701->#SourcePointer . 65->#NoTrailer . 14->#TempsNamesQCompress . 2->#TempsNamesZip}
>>>>
>>>>
>>>> When doing some analysis on source code, it is a pain to _either_
>>>> always go to disk for the source _or_ cache the code myself (which may
>>>> get out of sync sooon).
>>>>
>>>> If you're sending messages instead of viewing private innards, why is it a pain?
>>>
>>> What do you mean?
>>>
>>> Calling getSource on a CM goes 300km to disk instead of 1m to memory (metaphorically spoken)
>>> and when I do analysis on source code I typically do stuff like that a lot.
>>> And as developer I really dislike that I have to choose between either
>>>
>>> a) bad performance due to excessive IO (yes I want to access the source a lot)
>>> b) caching things myself when already two ways of storing them are available.
>>
>> On today's machines you don't have to. Once you read the data from the disk, it'll be cached in memory. It would be faster to access the sources, if they were stored in a trailer, but that would bump the image size by about 15 MB (uncompressed), or 9 MB (compressed):
>>
>
> I understand. But for a development image, I'd take that burden.
>
>> | size compressedSize |
>> size := compressedSize := 0.
>> CurrentReadOnlySourceFiles cacheDuring: [
>> 	SystemNavigation default allSelectorsAndMethodsDo: [ :behavior :selector :method |
>> 		| string compressed |
>> 		string := method getSource asString.
>> 		compressed := string squeakToUtf8 zipped.
>> 		size := size + string byteSize + ((string size > 255) asBit + 1 * 4).
>> 		compressedSize := compressedSize + compressed byteSize + ((compressed size > 255) asBit + 1 * 4) ] ].
>> { size. compressedSize }.
>>
>> "==> #(15003880 9057408)"
>
>
> What I am actually wondering about,
> there are two completely different ways to _access_ source stored in the image
> but no way to actually _store_ it there.

You can use #dropSourcePointer to embed the source of a method in the 
image. For 15k methods you better swap the methods with custom code which 
converts them in a single batch.

>
>>
>>>
>>>>
>>>>  Can't we just save the source code either via trailer or properties
>>>> on first access?
>>>>
>>>> -1.  Why do I want all of those String's in my image?
>>>
>>> To do stuff to them.
>>> Like, analysing how many dots are in them, or how often someone crafts a Symbol.
>>> Analysis stuff.
>>> Currently, I have a separate structure that holds onto the code once retrieved
>>> from disk. But once the method change (eg, recompilation) I have to first detect,
>>> that it happened, and second flush and refill this cache. I find this tiresome.
>>
>> Do you flush your cache selectively?
>
> No, I can't for reasons :)
>
>>
>> Scanning all source code for a given pattern takes less than a second (~800 ms) on my machine. What's your performance goal?
>
> I have ~15.000 Methods that I have to compare line by line against each other.
> Doing that by going to the filesystem just kills it.

It's hard to tell much without knowing the exact problem. If you want to 
take a method and compare it with all previously processed methods line by 
line, then you can create a dictionary which maps lines to methods (or 
method-line number pairs).

Levente

>
>
> Best
> 	-Tobias
>
>
>


More information about the Squeak-dev mailing list