<html><head></head><body>
<p>Your link does not work. <span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><a href="https://ponylang.zulipchat.com/#narrow/search/lzip">https://ponylang.zulipchat.com/#narrow/search/lzip</a></span></p>
<div class="moz-cite-prefix">On 4/21/20 2:05 AM, Shaping wrote:<br/>
</div>
<blockquote type="cite" cite="mid:05fa01d617a2$e3a328f0$aae97ad0$@uurda.org">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)"/>
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.EmailStyle21
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle22
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle23
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle24
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle25
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle26
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">The
Pony compiler and runtime need to be studied.</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in">What better way
than to bring the Pony compiler into Squeak? Build a Pony
runtime inside Squeak, with the vm simulator. Build a VM. Then
people will learn Pony and it would be great!<o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Yes,
that is one way. Then we can simulate the new collector
with Smalltalk in the usual way, whilst also integrating
ref-caps and dynamic types (the main challenge). We already
know that Orca works in Pony (in high-performance
production—not an experiment or toy). Still there will be
bugs and perhaps room for improvements. Smalltalk
simulation would help greatly there. The simulated
Pony-Orca (the term used in the Orca paper) or simulated
Smalltalk-Orca, if we can tag classes with ref-caps and keep
Orca working, will run even more slowly in simulation-mode
with all that message-passing added to the mix.</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in">The cost of
message passing reduces down when using the CogVM JIT. It is
indeed somewhat slower when running in the simulator. I think
the objective should be to run the Pony bytecodes<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Pony
is a language, compiler and runtime. The compiler converts
Pony source to machine code.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"> on the jitting
CogVM. This VM allows you to install your own
BytecodeEncoderSet. Note that I was definitely promoting a
solution of running Pony on the CogVM, not Orca.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Pony
is not a VM, either--no bytes codes. We would be studying
Orca structure in the Pony C/C++, how that fits with the
ref-caps, and then determine how to write something similar
in the VM or work Smalltalk dynamic types into the existing
Pony C/C++ (not nearly as fun, probably).<o:p></o:p></span></p>
<p class="MsoNormal"><br/>
<br/>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-left:.5in"> <span style="font-size:11.0pt;font-family:"Calibri",sans-serif">I’m
starting to study the Pharo VM. Can someone suggest what
to read. I see what appears to be outdated VM-related
material. I’m not sure what to study (besides the source
code) and what to ignore. I’m especially interested to
know <u>what not to read</u>.</span><o:p></o:p></p>
</blockquote>
<p style="margin-left:.5in">I would suggest sticking to Squeak,
instead of Pharo, as that is where the VM is designed &
developed. <o:p></o:p></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Okay.<o:p></o:p></span></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">How
do Pharo’s and Squeak’s VMs differ? I thought
OpenSmalltalkVM was the common VM. I also read something
recently from Eliot that seemed to indicate a fork. <o:p></o:p></span></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">I
thought Pharo had the new tools, like GT, but I’m not sure.
I don’t follow Squeak anymore. <o:p></o:p></span></p>
<p style="margin-left:1.0in">Here's a couple of interesting
blogs covering the CogVM [1][2] regarding VM documentation.<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">The
<u>problem</u> is easy to understand. It reduces to StW
GCing in a large heap and how to make instead may small,
well-managed heaps, one per actor. Orca does that already
and demonstrates very high performance. That’s what the
Orca paper is about.</span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">The CogVM has a
single heap, divided into "segments" I believe they are
called, to dynamically grow to gain new heap space.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Yeah—no,
it won’t work. Sympathies. Empathies.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><a href="https://ponylang.zulipchat.com/#narrow/search/lzip" moz-do-not-send="true">https://ponylang.zulipchat.com/#narrow/search/lzip</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Read
the thread above and watch the video to sharper your
imagination and mental model, somewhat, for <u>how real
object-oriented programs work <b>at run-time</b>.</u>
The video details are fuzzy, but you can get a good feel for
message flow. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">This
should have happened first in Smalltalk. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"> The performance
of the GC in the CogVM is demonstrated with this profiling
result running all Cryptography tests. Load Cryptography with
this script, open the Test Runner select Cryptography tests
and click 'Run Profiled':<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-left:.5in">Installer ss<br/>
project: 'Cryptography';<br/>
install: 'ProCrypto-1-1-1';<br/>
install: 'ProCryptoTests-1-1-1'.<o:p></o:p></p>
</blockquote>
<p style="margin-left:.5in">Here are the profiling results.<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-left:.5in"> - 12467
tallies, 12696 msec.<br/>
<br/>
**Leaves**<br/>
13.8% {1752ms} RGSixtyFourBitRegister64>>loadFrom:<br/>
8.7% {1099ms} RGSixtyFourBitRegister64>>bitXor:<br/>
7.2% {911ms} RGSixtyFourBitRegister64>>+=<br/>
6.0% {763ms} SHA256Inlined64>>processBuffer<br/>
5.9% {751ms} RGThirtyTwoBitRegister64>>loadFrom:<br/>
4.2% {535ms} RGThirtyTwoBitRegister64>>+=<br/>
3.9% {496ms} Random>>nextBytes:into:startingAt:<br/>
3.5% {450ms} RGThirtyTwoBitRegister64>>bitXor:<br/>
3.4% {429ms} LargePositiveInteger(Integer)>>bitShift:<br/>
3.3% {413ms} []
SystemProgressMorph(Morph)>>updateDropShadowCache<br/>
3.0% {382ms} RGSixtyFourBitRegister64>>leftRotateBy:<br/>
2.2% {280ms} RGThirtyTwoBitRegister64>>leftRotateBy:<br/>
1.6% {201ms} Random>>generateStates<br/>
1.5% {188ms} SHA512p256(SHA512)>>processBuffer<br/>
1.5% {184ms} SHA256Test(TestCase)>>timeout:after:<br/>
1.4% {179ms} SHA1Inlined64>>processBuffer<br/>
1.4% {173ms} RGSixtyFourBitRegister64>>bitAnd:<br/>
<br/>
**Memory**<br/>
old -16,777,216 bytes<br/>
young +18,039,800 bytes<br/>
used +1,262,584 bytes<br/>
free -18,039,800 bytes<br/>
<br/>
**GCs**<br/>
full 1 totalling 86 ms (0.68% uptime), avg 86
ms<br/>
incr 307 totalling 81 ms (0.6% uptime), avg
0.3 ms<br/>
tenures 7,249 (avg 0 GCs/tenure)<br/>
root table 0 overflows<o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">As shown, 1 full
GC occurred in 86 ms<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Not
acceptable. Too long. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"> and 307
incremental GCs occurred for a total of 81 ms. All of this GC
activity occurred within a profile run lasting 12.7 seconds.
The total GC time is just 1.31% of the total time. Very fast.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Not
acceptable. Too long. And, worse, it won’t scale. The
problem is not the percentage; it’s the <u>big delays
amidst other domain-specific computation.</u> These times
must be much smaller and spread out across many pauses
during domain-specific computations. No serious real-time
apps can be made in this case.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">I
suggest studying the Pony and Orca material, if the video
and accompanying explanation don’t clarify Pony-Orca speed
and scale. <o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> The
<u>solution</u> for Smalltalk is more complicated, and
will involve a concurrent collector. The best one I can
find now is Orca. If you know a better one, please share
your facts.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in">As different
event loops on different cores will use the same <o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="color:#4472C4">externalizing remote interface</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">This
idea is not clear. Is there a description of it?</span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">So I gather that
the Orca/Pony solution does not treat inter-actor messages,
within the same process to be remote calls? <o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Why
would the idea of ‘remote’ enter here? The execution scope
is an OS process. Pony actors run on their respective
threads in one OS process. Message passing is zero-copy;
all “passing” is done by reference. No data is actually
copied. The scheduler interleaves all threads needing to
share a core if there are more actors than cores. Switching
time for actor threads, in that case, is 5 to 15 ns. This
was mentioned before. Opportunistic work stealing happens.
That means that all the cores stay as busy as possible if
there is any work at all left to do. All of this happens by
design without intervention or thought from the programmer.
You can read about this in the links given earlier. I
suggest we copy the design for Smalltalk.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in">If each core has a
separate thread and thus a separate event loop, it makes sense
to have references to actors in other event loops as a remote
actor. Thus the parallelism is well defined.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal" style="margin-left:.5in">to reach other
event loops, we do not need a runtime that can run on all of
those cores. We just need to start the minimal image on the
CogVM with remote capabilities<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Pony
doesn’t yet have machine-node remoteness. The networked
version is being planned, but is a ways off still. By <i>remote</i>,
do you mean: another machine or another OS/CogVM process
on the same machine?</span><o:p></o:p></p>
</blockquote>
<p style="margin-left:.5in">Yes, I mean both. I also mean
between two event loops within the same process, different
threads.<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
I think the Pony runtime is still creating by default just
one OS process per app and as many threads as needed, with
each actor having only one thread of execution by
definition of what an actor is (single-threaded, very
simple, very small). A scheduler keeps all cores busy,
running and interleaving all the current actor threads.
Message tracing maintains ref counts. A cycle-detector
keep things tidy. Do Squeak and Pharo have those
abilities?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in">to share
workload.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">With
Pony-Orca, sharing of the workload doesn’t need to be
managed by the programmer.</span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">When I said
sharing of workload is a primary challenge, I do not mean
explicitly managing concurrency, the event loop ensures that
concurrency safety. I meant that the design of a parallelized
application into concurrent actors is the challenge,<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">If
you can write a state-machine with actors that each do one
very simple, preferably reusable thing in response to
received async messages, then it’s not a challenge. We do
have to learn how to do it. It’s not what most of us are
used to. Pony is a good tool for practicing, even if the
syntax is not interesting. Still, as mentioned, we should
make tools to help with that state-machine construction.
That comes later, but it must happen.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Pony
has Actors. It also has Classes. The actors have
behaviours. Think of these as async methods. <u>Smalltalk
would need new syntax for Actors, behaviours, and the
ref-caps that type the objects.</u> Doing this last bit
well is the task that concerns me most. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"> that exists for
Smalltalk capabilities and Pony capabilities. In fact, instead
of talking about actors, concurrency & parallel
applications, I prefer to speak of a capabilities model,
inherently on an event loop which is the foal point for safe
concurrency.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">I
suggest a study of the Pony scheduler. There are actors,
mailboxes, message queues, and the scheduler, mainly. You
don’t need to be concerned about safety. It’s been handled
for you by the runtime and ref-caps.<o:p></o:p></span></p>
<p class="MsoNormal"><br/>
<br/>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
That’s one of basic reasons for the existence of
Pony-Orca. The Pony-Orca dev writes his actors, and they
run automatically in load-balance, via the actor-thread
scheduler and work-stealing, when possible, on all the
cores. Making Smalltalk work with Orca is, at this early
stage, about understanding how Orca works (study the C++
and program in Pony) and how to implement it, if possible,
in a Smalltalk simulator. Concerning Orca in particular,
if you notice at end of the paper, they tested Orca
against Erlang VM, C4, and G1, and it performed much
better than all.</span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">I suppose it
should be measured against the CogVM, to know for sure is the
single large heap is a performance bottleneck as compared to
Pony/Orca performance with tiny per-actor heaps.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">I
don’t have time for Pony programming these days--I can’t
even read about these days. Go ahead if you wish.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Your
time is better spent in other ways, though. The speed and
scale advantages of Orca over the big-heap approach have
been demonstrated. That was done some time ago. Read the
paper by Clebsch and friends for details. Read Wallaroo
Lab’s field-experience whilst preparing to use Pony. Or
better, learn to write a Pony program. If your resources
don’t allow that, chat with Rocco Bowling (link above).
Everyone on Pony Zulip is very helpful and
super-enthusiastic about Pony—and it doesn’t even have its
own debugger the last time I checked. The tooling is poor,
and people still love this thing. Odd.<o:p></o:p></span></p>
<p class="MsoNormal"><br/>
<br/>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-left:.5in">The biggest
challenge, I think you would agree is the system/application
design that provides the opportunities to take advantage of
parallelism. It kinda fits the microservices arch. So, we
would run 64 instances of squeak to take the multicore to
town.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">No,
that’s much slower. Squeak/Pharo still has the basic
threading handicap: a single large heap.</span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">In my proposal,
with 64 separate squeak processes running across 64 cores,
there will be 64 heaps,<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">That
would be too few actors, in general. We are not thinking on
the same scale for speed and actor-count. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Expect
actor counts to scale into the thousands or tens of
thousands. There are about 100 in the app above. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"> 1 per process. There will be a finite
number of Capability actors in each event loop. This finite
set of actors within one event loop will be GC-able by the
global collector, full & incremental. As all inter-event
loop interaction occurs through remote message passing, the
differences between inter-vat (a vat is the event loop)
communication within one process (create two local Vats),
inter-vat communication between event-loops in different
processes on the same machine and inter-vat communication
between event-loops in different processes on different
machines are all modeled exactly the same: remote event loops.
<br/>
<br/>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Here’s
the gist of the problem again: <b><u>the big heap will
not work and must go away</u></b>, if we are to have
extreme speed and a generalized multithreading programming
solution. </span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">I am not convinced
of this.<br/>
<br/>
<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">You
must read of others’ <u>measurements</u>, or write your own
programs, and do the tests to get those measurements. Read
about the measurements made in the academic paper I cited.
That’s the easy way. You can also read the one from
Sebastian Blessing from 2013: <a href="https://www.ponylang.io/media/papers/a_string_of_ponies.pdf" moz-do-not-send="true">https://www.ponylang.io/media/papers/a_string_of_ponies.pdf</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> My
current understanding is that Pony-Orca (or
Smalltalk-Orca) starts one OS process, and then spawns
threads, as new actors begin working. You don’t need to
do anything special as a programmer to make that happen.
You just write the actors, keep them small, use the
ref-caps correctly so that the program compiles (the
ref-caps must also be applied to Smalltalk classes), and
organize your synchronous code into classes, as usual.
Functions run synchronous code. Behaviours run
asynchronous code.</span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">My point was
"writing the actors" and "organizing your synchronous code
into classes" are challenging in the sense of choosing what is
asynchronous and what is synchronous.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Yup,
but only for a while. Then you get used to it, and can’t
imagine anything different, like not having a big heap.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">
<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"> The parallel
design space holds primacy.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">No,
strictly, the state-machine design does. The
parallelization is done for you. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">You’re
not parallelizing anything. That’s not your job. (What a
relief, yes?) You’re an application programmer. You’re
writing a state-machine for your app, and distributing its
work across specialized actors, which you code and whose
async messages to each other change object data slots
(wherever they happen to live—which need not concern you),
and thus change the state of the state-machine you
designed. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">You
can’t use the multicore hardware you already own or the
goodness in the Orca and ref-cap design if you can’t write a
state-machine, and use actors, or don’t have a tool to help
you do that. Most of us will want to use such a tool even
if we are fluent at state-machine design. This doesn’t even
exist in Pony. It’s very raw over there, but you get used
to the patterns, as with any new strategy. Still I want a
tool. Don’t you?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Two
tasks: 1) build tools to help us make state-machines in a
reliable pleasant way, so that we feel compelled and happy
to do it; and 2) implement Pony-style scheduling, ref-caps,
and Orca memory management work in Smalltalk. <o:p></o:p></span></p>
<p class="MsoNormal"><br/>
<br/>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> The
issue is not whether to use Pony. I don’t like Pony,
the language; it’s okay, even very good, but it’s not
Smalltalk. I like Smalltalk, who concurrency model is
painfully lame. </span><o:p></o:p></p>
</div>
</blockquote>
<p style="margin-left:.5in">Squeak concurrency model.<o:p></o:p></p>
<p style="margin-left:.5in">Installer ss<br/>
project: 'Cryptography';<br/>
install: 'CapabilitiesLocal'<o:p></o:p></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">What
abilities does the above install give Squeak?</span><o:p></o:p></p>
</blockquote>
<p>This installs a local only (no remote capabilites)
capabilities model that attempts to implement the following in
Squeak, the E-Rights capabilities model. [3] This also ensures
inter-actor concurrency safety.<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-left:.5in">So your use of
Pony is purely to access the Orca vm?<o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Orca
is not a VM; it’s a garbage collection protocol for
actor-based systems. </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">I
suggest using Pony-Orca to learn how Orca works, and then
replace the Pony part of Pony-Orca with Smalltalk (dynamic
typing), keeping the ref-caps (because they provide the
guarantees). I realize that this is a big undertaking.
Or: write a new implementation of Orca in Smalltalk for
the VM. This is currently second choice, but that could
change.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in">I think you will
find the CogVM quite interesting and performant. <o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">--Not
with its current architecture.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">If
the CogVM is not able to:</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">1)
dynamically schedule unlimited actor-threads on all cores</span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal">Why not separate actor event-loop processes
on each core, communicating remotely? [4][5]<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">--Because
it will continue the current Smalltalk-concurrency
lameness. It’s a patch. And still it will not allow the
system to scale. The concurrency problem has been solved
nearly optimally and at high resolution in the current
Pony-Orca. There’s room for improvement, but it’s already
in a completely different performance league compared to any
big-heap Smalltalk. If I’m to work hard on an
implementation of this design for Smalltalk, I need a much
greater speed-up and scaling ability than what these patches
give. </span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">2)
automatically load-balance</span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">Use of mobility
with actors would allow for automated rebalancing.<br/>
<br/>
<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Speed
hit.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Too
slow/wasteful. Moving an actor isn’t needed if the each
has its own heap.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">3)
support actor-based programs innately</span><o:p></o:p></p>
</blockquote>
<p style="margin-left:.5in">With this code, asynchronous
computation of "number eventual * 100" occurs in an event loop
and resolves the promise <o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p style="margin-left:.5in">[:number | number eventual * 100]
value: 0.03 "returning an unresolved promise until the async
computation completes and resolves the promise"<o:p></o:p></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
</blockquote>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Promises
and notifications are fine. Both happen in Pony-Orca. But
the promises don’t fix the big performance problems.<o:p></o:p></span></p>
<p style="margin-left:.5in">Am I wrong to state that this model
allows innate support to actors? Or were you somehow stating
that the VM would need innate support? Why does the VM have to
know?<o:p></o:p></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">It’s
not enough. We still have the big pauses from GCs in a
large heap.<o:p></o:p></span></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">4)
guarantee no data-races</span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">The issue to
observe is whether computations are long running and livelock
the event loop from handling other activations. This is a
shared issue, as Pony/Orca are also susceptible to this.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Yes,
and a dedicated cycle-detecting actor watches for this in
Pony-Orca. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"> E-right's event
loops ensure no data races, as long as actor objects are not
accessible from more than one event-loop.<o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Speed
hit.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">No
blocking and no write barriers exist in Pony-Orca. You
can’t wait. If you need to “wait,” you set a timer and
respond to the event when the timer fires. <o:p></o:p></span></p>
<p class="MsoNormal"><br/>
<br/>
<o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in">Imagine a cloud
based compute engine, processing Cassandra events that uses
inter-machine actors to process the massively parallel
Cassandra database. Inter-thread communication is not
sufficient as there are hundreds of separate nodes.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Yes;
I didn’t claim otherwise. The networked version is coming.
See above. My point is that the ‘remote’ characterization
is not needed. It’s not helping us describe and understand.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal" style="margin-left:.5in"> Design wise, it
makes much sense to treat inter-thread, inter-process and
inter-machine concurrency as the same remote interface.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">No
new design is needed for concurrency and interfacing. There
is much to implement, however.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">The
design is already done, modulo the not-yet-present network
extension. Interfacing between actors is always by async
messaging. Messaging will work as transparently as possible
in the networked version across machine nodes. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">The
issue is how most efficiently to use Orca, which
happens to be working in Pony. Pony is in production
in two internal, speed-demanding, banking apps and in
Wallaroo Labs’ high-rate streaming product. Pony is a
convenient way to study and use a working
implementation of Orca. Ergo, use Pony, even if we
only study it as a good example of how to use Orca.
Some tweaks (probably a lot of them) could allow use
of dynamic types. We could roll our own
implementation of Orca for the current Pharo VM, but
that seems like more work than tweaking a working Pony
compiler and runtime. I’m not sure about that. You
know the VM better than I. (I was beginning my study
of the Pharo/OpenSmalltalkVM when I found Pony.)</span><o:p></o:p></p>
</div>
</blockquote>
<p style="margin-left:.5in">Sounds like you might regret your
choice and took the wrong path. <o:p></o:p></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">I
don’t see how you form that conclusion. I’ve not chosen
yet.</span><o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-left:.5in">You stated you are
not thrilled with using Pony.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">I
don’t like the Pony language syntax. I don’t like anything
that looks like Algo-60. Pony is a language, compiler and
runtime implementing Orca. The other stuff is good. And
I’ve not had much time to use it; I suspect I could like it
more.<o:p></o:p></span></p>
<p class="MsoNormal"><br/>
<br/>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">[…]<o:p></o:p></span></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">If
most of what Squeak/Pharo offers is pleasant/productive VM
simulation, much work still remains to achieve even a
basic actor system and collector, but the writing of VM
code in Smalltalk and compiling it to C may be much more
productive than writing C++. The C++ for the Pony
compiler and runtime, however, already compiles and works
well. Thus, starting the work in C++ is somewhat
tempting. <span style="color:#4472C4">Can someone
explain the limits of how the VM simulator can be used?
How much VM core C is not a part of what can be compiled
from Smalltalk? Can all VM C code be compiled from
Smalltalk?</span><o:p></o:p></span></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
</blockquote>
<pre><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Can someone answer the above question?<o:p></o:p></span></pre>
<p>[1] Cog Blog - <a href="http://www.mirandabanda.org/cogblog/" moz-do-not-send="true">http://www.mirandabanda.org/cogblog/</a><br/>
[2] Smalltalk, Tips 'n Tricks - <a href="https://clementbera.wordpress.com/" moz-do-not-send="true">https://clementbera.wordpress.com/</a><br/>
[3] Capability Computation - <a href="http://erights.org/elib/capability/index.html" moz-do-not-send="true">http://erights.org/elib/capability/index.html</a><br/>
[4] Concurrency (Event Loops) - <a href="http://erights.org/elib/concurrency/index.html" moz-do-not-send="true">http://erights.org/elib/concurrency/index.html</a><br/>
[5] Distributed Programming - <a href="http://erights.org/elib/distrib/index.html" moz-do-not-send="true">http://erights.org/elib/distrib/index.html</a><o:p></o:p></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Shaping<o:p></o:p></span></p>
<p><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"><o:p> </o:p></span></p>
</div>
</blockquote>
<pre class="moz-signature" cols="72">--
Kindly,
Robert</pre>
</body></html>