[Vm-dev] PICs (was: RISC-V J Extension)

Wed Jul 25 06:12:53 UTC 2018

Hi,

*One of the papers in that list is the 1997 techical report "The
SpaceOverhead of Customization". One of the reasons that Java won over
Selfwas that its simple interpreter ran on 8MB machines that most of
Sun'scustomers had while Self needed 24MB workstations which were rare
(butwould be very common just two years later). Part of that was due
tocompiling a new version of native code for every different type
ofreceiver even if the different versions didn't really help.*

Note that Javascript people had the same issue and they built many
work-around, such as fine-tuning transitions between hidden classes/map or
sharing the code between hidden classes/map.

In addition, the Self VM featured a non optimizing and an optimizing JIT
but no interpreter. The interpreter is the bit allowing to save memory
there (code rarely used can be kept in the form of bytecode, which is
usually ~5x more compact than n code, and unused compiled native code can
be GC'ed).

So there's a lot to argue there. I don't think Java won because of
technical issues but political ones, the Self VM could definitely have been
patched to fix this issue.

*My idea of allowing PICs to be optionally shared was that this wouldallow
customization to be limited in certain cases to save memory. Itwould cause
a loss of information about types seen at a call site, butthat doesn't
always have a great impact on performance.*

We do that in the Cog for openPIC (Polymorphism of 6+ cases). In any case
the optimizing JIT rarely optimizes more than 2 cases (though in some VM
such as JSCore it can optimize up to 8 cases), so sharing openPICs makes
sense.

For closedPIC, i.e. the jump tables, I don't think they represent a lot of
memory and indeed the type information is relevant, so, not sure about
sharing those.

On Wed, Jul 25, 2018 at 1:18 AM, Jecel Assumpcao Jr. <jecel at merlintec.com>
wrote:

>
> While it is bad form to move a private discussion to (or back to) a
> public forum, some of these links might be interesting to people here
> and I have been unable to send emails to Tobias after my initial reply.
> An attempt on Wednesday and on Friday made mcrelay.correio.biz complain
> that mx00.emig.gmx.net[ refused to talk to it and an attempt from my old
> 1991 email account on Monday complained about the email address though
> it was ok as far as I can tell.
>
> Tobias wrote:
> > Jecel wrote:
> > > [new direction: emulate bytecodes and RISC-V]
> >
> > That'a an interesting take.
> >
> > I can only watch from afar, but its all interesting. (for example that
> guy
> > who does  RISC-V cpu in TTL chips: https://www.youtube.com/
> channel/UCBcljXmuXPok9kT_VGA3adg )
>
> It is an interesting project. I was annoyed by his claim to have the
> first homebrew TTL 32 bit processor since in the late 1990s a group of
> students at the MIT processor design course implemented the Beta
> processor in TTLs instead of using FPGAs like all other groups (before
> or since). Sadly, all information about this has been eliminated from
> the web and can't even be found in archive.org.
>
> I tried to get the local universities to teach RISC-V to their students
> instead of their own educational RISC processors but they are too
> emotionally attached to their designs.
>
> > Sounds reasonable. Let's have them know dynamic languages are also still
> there ;)
> > (I mean, you're very familiar with both Smalltalk and Self...)
>
> Mario Wolczko has been involved in Java since the late 1990s but was
> part of the Self group before that and had created the Mushroom
> Smalltalk computer before that.
>
> http://www.wolczko.com/
>
> Boris Shingarov is currently involved with Java but has given a lot of
> talk about Smalltalk VMs and was involved in Squeak back in the OS/2
> days.
>
> http://shingarov.com/
>
> With me, that was 3 out of 6 people at the meeting representing the
> Smalltalk viewpoint. We shall see if that will have any practical
> effect.
>
> > The TLB is somewhat maintained by the CPU to manage the translation of
> virtual addresses to physical ones.
> >
> > I can imagine something similar, like a branch, that upon return,
> updates a filed
> > in a PIC buffer, such that the next time the branch is only taken if a
> register (eg, class of the object)
> > is different or so.
>
> Ok, Mario actually mentioned that with today's advanced branch
> prediction hardware we might want to re-evaluate PICs. In this case you
> wouldn't be using the TLBs but the BTB (Branch Target Buffer) hardware.
>
> https://www.slideshare.net/lerruby/like-2014214
>
> Mario might have actually been thinking about Urs Hölzle's ECOOP 95
> paper, which was a slightly different subject.
>
> http://hoelzle.org/publications/ecoop95-dispatch.pdf
>
> They were looking at the different kinds of software implementation of
> method dispatch (not only PICs) and the effects of processors executing
> more and more instructions per clock cycle. That might make a scheme
> that is bad for a simple RISC (due to many tests, for example) actually
> work well on an advanced out-of-order processor (due to the test being
> "free" since they execute in parallel with the main code). They didn't
> look at branch prediction hardware, but it certainly would have a huge
> impact. Several of the later papers focused on branch prediction:
>
> http://hoelzle.org/publications.html
>
> > > For SiliconSqueak I actually had two different PIC instructions. They
> > > modified how the instruction cache works. Normally the instruction
> cache
> > > is accessed by hashing the 32 bit value of the PC except for the lowest
> > > bits which select a byte in the cache line, but after a PIC instruction
> > > the hash used a 64 bit value that combined the PC (all bits) and the
> > > pointer to the receiver's class. The resulting cache line was fetched
> > > and instructions executed in sequence even though the PC didn't change.
> > > Any branch or call instruction would restart normal execution at the
> new
> > > PC.
> >
> > Sounds neat!
> >
> > > So a PIC entry takes up exactly one cache line. A PIC can have as many
> > > entries as needed and the instruction takes the same time to execute no
> > > matter how many entries there are (not taking into account cache
> > > misses).
> >
> > Wow thats incredible.
> >
> > > The second PIC instruction works exactly like the first but it supplies
> > > a different value to be used in place of the current PC. That allows
> > > different call sites to share PIC entries if needed, though that might
> > > be more complicated than it is worth.
> >
> > Maybe. What I like about PICs per send site is that you can essentially
> use them
> > as data source for dynamic feedback (what "types" where actually seen at
> this send site?)
> > and one probably would need some instructions to fetch those infos from
> the PIC.
>
> One of the papers in that list is the 1997 techical report "The Space
> Overhead of Customization". One of the reasons that Java won over Self
> was that its simple interpreter ran on 8MB machines that most of Sun's
> customers had while Self needed 24MB workstations which were rare (but
> would be very common just two years later). Part of that was due to
> compiling a new version of native code for every different type of
> receiver even if the different versions didn't really help.
>
> My idea of allowing PICs to be optionally shared was that this would
> allow customization to be limited in certain cases to save memory. It
> would cause a loss of information about types seen at a call site, but
> that doesn't always have a great impact on performance.
>
> -- Jecel
>

-- 
Clément Béra
https://clementbera.github.io/
https://clementbera.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180725/08fb5ee6/attachment-0001.html>