[squeak-dev] Can we extract type information from the VM?

Wed Sep 25 07:56:39 UTC 2013

Hi all,

after my PhD is done, we will publish its results (the Path tools framework) to the Squeak 
community in the near future. Among others, we have also implemented a "type harvester" 
that collects type information from running (passing) test cases. This information will be 
presented within a browser extension (label, see screenshot). 

For more information have a look at the following papers: 
Type Harvester: http://michaelperscheid.de/publications/papers/HauptPerscheidHirschfeld_2011_TypeHarvestingAPracticalApproachToObtainingTypingInformationInDynamicProgrammingLanguages_AcmDL.pdf 
Path Tools: http://michaelperscheid.de/publications/papers/PerscheidHauptHirschfeldMasuhara_2012b_TestDrivenFaultNavigationForDebuggingReproducibleFailures_JSSST.pdf

Stay tuned :-)

Best,
Michael

On 15.09.2013, at 21:21, Eliot Miranda <eliot.miranda at gmail.com> wrote:

> 
> 
> 
> On Sun, Sep 15, 2013 at 10:17 AM, Bob Arning <arning315 at comcast.net> wrote:
> I'm not clear on what you are suggesting. Dispatching a message does require the VM knowing the class of the receiver, but how and where the VM might collect that information is not clear. Perhaps an example would help.
> 
> The JIT uses inline caches at send sites to optimize sends.  These tell you
> - whether a send has been executed; if a send has never been executed the send site will be unlinked with no cache data.
> - whether the send has been sent to a single class of receiver, and what that class is; if so, a send site will be linked to a method and have one class entry in the inline cache.
> - whether the send has been sent to a small number of classes of receiver (in Cog up to 6), and what these are; if so the send site will be linked to a "closed" Polymorphic Inline Cache with up to 6 class entries.
> - whether the send has been sent to more than 6 classes; if so the site will be linked to an "open" polymorphic inline cache, which is a first-level method lookup cache probe with no classes cached.
> 
> So the VM, in optimizing sends, collects type data on send sites, untaken, monomorphic, polymorphic or megamorphic.  This is the bases of adaptive optimization in VMs such as HotSpot and V8.  After Spur, this is the next target for Cog.
> 
> See e.g. build me a jit for gory details.
> 
> 
> Cheers,
> Bob
> 
> HTH
>  
> 
> On 9/15/13 1:06 PM, Frank Shearar wrote:
>> On 15 September 2013 17:38, Florin Mateoc <florin.mateoc at gmail.com>
>>  wrote:
>> 
>>> On 9/15/2013 11:47 AM, Frank Shearar wrote:
>>> 
>>>> On 15 Sep 2013, at 14:57, Florin Mateoc <florin.mateoc at gmail.com>
>>>>  wrote:
>>>> 
>>>> 
>>>>> On 9/15/2013 5:54 AM, Frank Shearar wrote:
>>>>> 
>>>>>> I was rereading Phlip's "what's wrong with our IDEs" post -
>>>>>> 
>>>>>> http://www.oreillynet.com/onlamp/blog/2008/05/dynamic_languages_vs_editors.html
>>>>>> 
>>>>>> - and realised that he's just verbalised something I've only
>>>>>> half-thought.
>>>>>> 
>>>>>> When we run our tests (because of course we're using TDD) we know the
>>>>>> precise types/expected classes of everything, because the VM
>>>>>> automatically collects (or can collect) this information.
>>>>>> 
>>>>>> But how do we get that information out of the VM?
>>>>>> 
>>>>>> frank
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> You don't need to extract it from the VM, you can have a type profiler that collects it for you in the image.
>>>>> 
>>>> Doesn't that just mean twice as much work? The VM of necessity has already typed the call sites (even if the typing is only eventually correct). Why could a mirror not expose the typing thus far?
>>>> 
>>>> frank
>>>> 
>>>> 
>>>>> Florin
>>>>> 
>>>>> 
>>> Doing it in the image means you do it in Smalltalk. Extracting it from the VM means you are doing it in C/assembly.
>>> And I definitely do not understand the argument with twice as much work. Work for whom? For the computer? Well, that's
>>> its job. As the developer, you only do it once, regardless which option you chose. I prefer doing it in Smalltalk
>>> 
>> Well, someone  has to write the code to collect and extract the information.
>> 
>> Unless I've completely misunderstood you, you're saying I should build
>> an interpreter within which to run my tests, and that collects this
>> type information. I'm saying that the VM has to do this _already_ and
>> exposing this information to the image (through a mirror or similar)
>> means that (a) you get accurate type information and (b) you don't
>> have to write an interpreter.
>> 
>> How would a type profiler collect information at least as accurately
>> as the VM already does?
>> 
>> frank
>> 
>> 
>> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> best,
> Eliot
> 

---
Michael Perscheid
michaelperscheid at googlemail.com

http://www.michaelperscheid.de/

-------------- next part --------------
Skipped content of type multipart/related