[squeak-dev] Inboard Squeak Sources

Eliot Miranda eliot.miranda at gmail.com
Sat Dec 24 21:03:15 UTC 2022


Hi Tim,

> On Dec 22, 2022, at 1:11 PM, tim Rowledge <tim at rowledge.org> wrote:
> 
> Looking briefly at this issue again, we might want to consider merging the CompiledMethodTrailer & AdditionalMethodState concepts. It seems a touch daft to have two mechanisms overlapping so much.
> 
> Of course, pretty much all the capabilities provided by the two could be done more cleanly by splitting methods up similarly to the version we did at Interval in '97/98; have a normal object with ivars for the literals, pragma, source info, bytecodes, etc.Yes, it would cost some memory . It would potentially save some time in GC. It would be simpler to extend things. Yada-yada.

No, no, no and no!  This works for HIT only implementations but it is awful for interpreted implementations. One has to traverse two objects to get to the bytecode, not one.

 Y all means explore the AdditionalMethodState approach. With simple bytecode compiler extensions inst vars in AMS can be used syntactically as inst vars in CompiledCode, compiling to message sends, eg as in Andreas’ Tweak compiler  that does the same for a tweak object’s properties.

> 
> Aside from that pipedream, it would probably be simplest to go for dropping the trailer bytes, and putting the info in the properties object. So that would mean every method would
> replace the penultimate literal with a pointer to an AdditionalMethodState object, which would contain the usurped selector and some source access object. They also already include a pointer back to the method and then indexed values for pragmas etc. So that's 32 bytes for a bare case, or in a squeak 6 image a  megabytes & half since 1500 methods already have an AdditionalMethodState and we would 'save' the trailer bytes space. I'm not entirely sure why the back-pointer to the method is required after looking at the usage of it. Being able to dump that would save another ~half-MB.

It’s a lot of space to give up when the trailer implementation works.

The back pointer is necessary whenever access to AdditionalMethodState would change the AMS’s method.  Look at senders. You’ll see some use cases.

> 
> What would this buy us? Simpler is almost always easier to understand and maintain, which is good. Storing the source would become a bit simpler since we could make appropriate classes for remote, in-image, in-image compressed, in-database, ask-another-image, etc.
> 
> A quick test of the size the system sources would be if kept in-image-compressed suggests ~11MB, which is rather better than the 52MB of the plain text file.  2MB of that is the 360 methods with more than 1024 bytes of zipped source, things like the sound of coffee cups clinking & car motors, plus some font stuff. 
> 
> A trivial hack to keep the sources in AdditionalMethodStates for every method, zipping as added, shows it can function. 

That’s not the thing to prototype. Extending the ClassBuilder and bytecode compiler do that one can create subclasses of CompiledMethod whose inst vars live in AMS subclasses is the thing to prototype.

> 
>> On 2022-12-04, at 6:56 PM, tim Rowledge <tim at rowledge.org> wrote:
>> 
>> Some recent debugging handed me a reminder that we have several varieties of 'inboard source' in the system already, just not presently used and probably not complete.
>> 
>> Take  a look at CompiledMethod>>#getSourceFor:in:
>> - it looks at the method properties, checking for a Dictionary with a #source entry
>> - it checks the method trailer for tempNames, and if found, decompresses the names string and then decompiles the method and inserts those tempNames
>> - it checks the trailer  for sourceCode that may be contained in one of four different ways, two of them being compressed strings in the trailer and I think the code in CompiledMethodTrailer>>#sourceCode fails to decompress them? The other two methods rely on methods that no longer exist in the image.
>> - it checks if the trailer has a source pointer, and if not, decompiles the method with no assistance fro many temp names stuff
>> - if there is a source pointer, that value is used to fetch the source from the file(s), and as a backup for the files not being there it repeats the bare decompile code.
>> 
>> I think we might be able to clean that up a bit.
>> 
>> tim
>> --
>> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
>> Useful Latin Phrases:- Illiud Latine dici non potest = You can't say that in Latin.
>> 
>> 
>> 
>> 
> 
> 
> tim
> --
> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
> Useful Latin Phrases:- Illiud Latine dici non potest = You can't say that in Latin.
> 
> 
> 


More information about the Squeak-dev mailing list