[Vm-dev] Pony for the Pharo VM

Ronie Salgado roniesalg at gmail.com
Thu Apr 9 09:59:22 UTC 2020


Dear Shaping,

You seem to misunderstand the problem. This is not a technical problem, but
a political problem. The language seems very cool, but it still have to be
proven in production by a large company that shows that is actually being
used, it is maintained and will be keep being maintained for decades, which
is something that can be said about about C/C++ (Many companies) and Rust
(Mozilla) (Go (Google) is out of the question because of its mandatory
garbage collector). All of the people already working on the VM has their
own political and technical agendas, and they will continue working on
that. Mathematical and safeness proofs are not enough. The usage of Slang
is also bad on this regard, but with the difference that there is already
an existing open source VM that works (and used by several companies), and
making a new VM is a high risk endeavor from a technical point of view (and
political and business also).

This means that if you really want to use Pony for making a Pharo or Squeak
VM, then you are going to have to make it yourself, or to pay someone else
willing to do it. You will not be able convince the existing people to
adopt a new language unless you have something running in that language
that actually proves that is worth it to change.

How complete is the documentation on the current Pharo VM?
>
 Very incomplete. The best documentation is the code itself, then you have
old wiki articles in the Squeak wikis, and some technical articles in Eliot
Miranda and Clément Bera blogs. There are some additional ongoing efforts
on documenting the VM from the Pharo people:
https://github.com/SquareBracketAssociates/Booklet-PharoVirtualMachine and
here:
https://github.com/pharo-open-documentation/pharo-wiki/tree/master/PharoVirtualMachine

I also wrote a very simplistic vm in pure C that can load a Pharo 64 bits
image, interpret the bytecodes here: https://github.com/ronsaldo/crankvm .
This cannot be used for actually running Pharo because most of the
primitives are not yet implemented. I wrote this mostly for documenting
myself on how things work, and for getting into that I had to read several
parts of the existing vm sources, and the blue book (
http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf ).

We are talking about different, but related things:  you can write a Pony
> program whose domain-level state-machine is wrong or unfinished and
> therefore not working correctly.  This is called *livelocking*.  It’s a
> domain-level problem, not a system problem (for Pony).  Livelocking is the
> developer’s problem, not Pony’s.  Compiled Pony code cannot
> deadlock/data-race.  It’s not possible.  Dealing with livelocking
> (programming a state machine thoroughly and correctly so that you get the
> behavior you want and describe in your code) is the subject of a special
> grammar and tool I’m working on for state-machine based programming.  This
> would supplement or replace the system browser.  Right now my approach to
> state-machine creating is a grammatical discipline that works very well for
> me.  I use it in Smalltalk, increasingly, and tend not to code without it
> anymore.  I don’t want to be limited to green threads, however, and
> definitely don’t want explicit concurrency management.  I don’t have that
> much time to waste, and hope everyone reading this has been burnt badly
> enough by concurrency bugs to have a similar view.
>
>
>
> You even need to model network and power failures in your state machine.
> So the programming language may help you a lot with you concurrent,
> distributed and fault tolerant system programming, but they are not a
> silver bullet that guarantees that your system is actually going to be
> correct.
>
>
>
> See above concerning the difference between livelock and deadlock.
>

Thanks for the correct terminology. But from the point of view of the user
they are same. And a livelock seems to be far more dangerous than a
traditional deadlock for which you already have some debugging tools that
are relatively easy to use. You can create a pthread_mutex with deadlock
detection for detecting bug, or run your program through valgrind for
detecting race conditions and deadlocks. The point of this is that the
language safety net is not enough, and in fact it could actually be
dangerous because it can instill a false sense of safeness. That false
sense of safeness can be very dangerous in real world projects with
deadlines that have to be met, and non-technical managers which are going
to underestimate the deadline far more on this false sense of safeness. And
since in the real world deadlines have to bet met, bugs and incomplete
solutions are simply shipped.


> Can Sysmel manage many threads on many cores (running actor threads in
> parallel, not just green-thread-concurrently on one core), whilst
> guaranteeing no data-races, and switch automatically between actor-threads,
> without blocking (no wasted CPU cycles), in 5 to 15 ns?  Pony can do that
> now.
>
>
>
> If Sysmel cannot do those operations, or cannot do them as fast, can you
> add the abilities cost-effectively?  They already work in Pony, and a very
> active team drives the effort.
>

I am developing the language, and if I want or need I can create a minimal
vm suitable for my purpose (I am not even trying to convince existing
people to use it for now, I will when I have the next version of Woden
running on it, and they will be able to just use a Pharo subset for
scripting), for which I can adapt into my purposes and needs, which are
currently only being fulfilled by C and C++ (and perhaps Rust, but I found
it too complicated). I would just use C++ if it had runtime reflection, and
easy to do, reliable and robust live programming support but it does not
have these crucial features. You can implement live programming in C++ by
serializing your program state, changing the dll with your code, and then
deserialize your program state. Program state serialization is not easy.

Real multi-threading on Sysmel is already supported when you use it without
garbage collection (the language is strongly modeled after C++11 when used
in this way). When you enable the GC for Smalltalk semantics, it is
currently using the Boehm conservative GC (I am using it for getting things
running), so the concurrency will be limited by the stop the world
semantics of the GC. When I implement a proper GC, I will be able to
dissociate threads that only use the native runtime from the GC. But for
"many threads", and high parallelism, what I use is the GPU where Sysmel
can also be used as a shading language by generating Spir-V for Vulkan
consumption (for this the llvm backend is not used, it does not expose all
of the required semantics, unless the changed it recently). As for the
actor model and by default non-blocking semantics, I just do not care about
them (for now) because I know they bring their own problem that can have
their impact on other parts of system (e.g. non-blocking IO everywhere and
by default, on operating systems that do not support it very well). Instead
I prefer explicit semantics, and if the user wants actor, then implement
them as just another library in the system.

BTW: how do you implement an actor? with a message queue, and how do you
implement a message queue? with a mutex and a condition variable. What
about deadlock? you are only taking a single mutex on the queue, so you
cannot have a deadlock, since you need at least two mutexes for the dead
lock that are taken in a different order. If you have N mutexes, for not
having a deadlock you only have to take these N mutexe in the same order.
That sounds easy, but in the practice is not and you would have to sort the
actual addresses of the mutexes, and in many cases you will realize that
you need to take an additional mutex after having already taken other
mutexes (I know it from practice). The mutex of the queue is always the one
taken last, and that is why it is safe to use the queue for message passing.

So, what is so special about these actor languages? they actually use a M:N
threading model where they actually use green threads in order to multiplex
multiple cooperative threads into the different cores of the CPU. They only
need to create N native threads that are pinned into the different cores of
the CPU (sched_setaffinity in Linux), and the green threads are created by
allocating stack memory and having a simple trampoline that stores all of
the caller saved registers in the stack, switches the stack pointer,
restores these same registers and returns. BTW the operating system level
context switching machinery is also implemented on the same way, but it
could have some additional instructions for returning into unprivileged
code.

These 5 to 15 ns figures typically come from switching the stack. And with
64 bits addresses the simplest way for allocating the stack memory is by
just allocating large uncommitted memory for the stack memory, or use some
other fancier schemes (See here:
https://blog.cloudflare.com/how-stacks-are-handled-in-go/ ). You would also
need a task queue (or process queue in OS terminology) for not blocking the
different actors when they are waiting for messages. There are several
papers that discuss how to implement this task queue which tends to be the
main bottleneck of these system. In the case of userspace, since the OS is
not aware of you green threads, you also need asynchronous IO for
everything so that an IO operation does not block all of the tasks that
running in a single core (I believe that I read from a different mechanism
from a paper from Go that is more flexible, but I forgot about it).

If you do not actually need a coroutine, and model your actors in terms of
a single function application that receives a single message and process
it, then you only need a single stack. (Fetch message from queue, apply the
function, done), and you do not even have that 5-15 ns overhead for stack
switching (you may have another overhead on pushing the pending actor on
the ready queue that could outweigh the performance advantage). This is
similar to a traditional asynchronous job system for which you only need a
thread pool. And this can also be implemented in C/C++.

Best regards,
Ronie

El jue., 9 abr. 2020 a las 7:43, Shaping (<shaping at uurda.org>) escribió:

>
>
> I have to admit that that language looks quite interesting, but it is not
> appropriate for writing an actual VM.
>
>
>
> The point of the Pony language and concurrency model is to be able to
> write *any program*, even a VM, at scale and use all of the cores, with
> minimal programming effort (compared to what the developer must endure when
> explicitly managing synchronization issues).
>
>
>
> One thing is having the actor object model for writing an application that
> scales where it can be a very desirable property, but another thing is
> writing a  virtual machine where you have time constraints for the just in
> time compiler in a single CPU to reduce initialization time.
>
>
>
> A VM is a program.  You can write any program you want with Pony.  The
> heaps are already actor-centric and usually very tiny, just the way we need
> them for high-performance, real-time apps:
>
>
>
> Orca: GC and Type System Co-Design for Actor Languages:
>
>
>
>
> https://www.ponylang.io/media/papers/orca_gc_and_type_system_co-design_for_actor_languages.pdf
>
>
>
> Can you be more specific about which program construct cannot be written
> in Pony or cannot be written to run fast enough?
>
>
>
> You can write as much asynchronous and synchronous code as you wish, where
> and when you needed it.  The balance point is yours, as the developer, to
> choose.
>
>
>
> In fact, most of the methods are actually interpreted and they are only
> compiled after being executed several times for reducing this startup time.
>
>
>
> Of course.  But this is not a problem peculiar to Pony (or C or Rust
> or…).  It’s just another programming task needed in the context of VM
> design.   Note the comment about using both JIT and AOT, as desired, during
> development.  In any case, if you use Pony, use a state-machine (which we
> all should do anyway).  If the developer cannot or will not develop the
> discipline needed to build state-machines, systematically, Pony or any
> highly concurrent, actor-based programming model will only thwart and
> frustrate him.  We can’t efficiently use all these cores without such a
> programming model.
>
>
>
> I think I would start by using the Pony compiler from a Smalltalk browser
> (mod the browser to accommodate the actors too).  Source code in files,
> searching, and scrolling are too inefficient.
>
>
>
> In this problem domain, the actor model and the thread safeness guaranteed
> by it does not help you at all, and deadlocks can always be produced.
>
>
>
> This statement is incorrect in the case of Pony (but we have a terminology
> problem; see below), and this was the main reason for the post.  Here is
> the gist again:  *Deadlocks and data-races are not possible in a Pony
> program that compiles.*  This is mathematically guaranteed.  You can
> glean this fact from the videos or study the details in Pony papers.
>
>  I did have deadlocks problems with synchronous messages in Erlang
>
>
>
> Pony is not Erlang.  It is like Erlang is many ways, but vastly improved.
> My qualified (“in the round, for starters”) comparison was a mistake.  I
> should have omitted the whole comment.  The Erlang/Pony comparison never
> goes well.  Pony is a different animal.  And it’s seriously powerful.
> Forget about Erlang.  Really.  Just forget about it.  Use what was learnt
> there, and implement it in Pony.
>
>
>
> which forced me to go into using asynchronous messages, but if you do not
> model your domain state machine you could also end with a deadlock by using
> asynchronous messages, or even worse, an inconsistent state such as an
> incomplete credit card transaction in a highly distributed system!
>
>
>
> We are talking about different, but related things:  you can write a Pony
> program whose domain-level state-machine is wrong or unfinished and
> therefore not working correctly.  This is called *livelocking*.  It’s a
> domain-level problem, not a system problem (for Pony).  Livelocking is the
> developer’s problem, not Pony’s.  Compiled Pony code cannot
> deadlock/data-race.  It’s not possible.  Dealing with livelocking
> (programming a state machine thoroughly and correctly so that you get the
> behavior you want and describe in your code) is the subject of a special
> grammar and tool I’m working on for state-machine based programming.  This
> would supplement or replace the system browser.  Right now my approach to
> state-machine creating is a grammatical discipline that works very well for
> me.  I use it in Smalltalk, increasingly, and tend not to code without it
> anymore.  I don’t want to be limited to green threads, however, and
> definitely don’t want explicit concurrency management.  I don’t have that
> much time to waste, and hope everyone reading this has been burnt badly
> enough by concurrency bugs to have a similar view.
>
>
>
> You even need to model network and power failures in your state machine.
> So the programming language may help you a lot with you concurrent,
> distributed and fault tolerant system programming, but they are not a
> silver bullet that guarantees that your system is actually going to be
> correct.
>
>
>
> See above concerning the difference between livelock and deadlock.
>
>
>
> Going back to the task of developing a VM, you also need to be able to
> perform dangerous memory accessing operations for at least the following
> three tasks:
>
> 1. Implementing the garbage collector.
>
>
>
> Actually no; there is nothing dangerous here.  In Pony, a separate heap
> exists for each actor.  These are generally tiny, and come and go quickly.
>  If they are not tiny or at least very simple/uniform in structure, they
> should be made so by refactoring actor scope, until they are.  Smallness
> and clarity of purpose are the main criteria for determining whether you’ve
> written an actor well.  Those two properties also greatly ease debugging of
> the actor.  If you have a big actor not factored as a network of smaller
> ones, you’ve done something wrong, or you’ve just started your state
> machine, and have some factoring yet to do.  You still have classes, but
> these are an organizational tool for synchronous code used by Actors and
> their asynchronous behaviors.
>
>
>
> 2. Direct access to object slots for implementing the bytecode interpreter.
>
>
>
> Not a problem.  It’s just a program feature.   So we write it as it needs
> to be.
>
>
>
> 3. Copying compiled machine code into executable memory and performing
> position dependent relocations.
>
>
>
> Doable.  The Pony FFI works well even at this early stage.
>
>
>
> The machine code generation and installation can be separated in two stage
> (the current VM just generates the code directly into the executable
> memory), and in fact having these two separated stages for compilation and
> installation is a requirement for operating systems that enforce W^X page
> level permissions, specially if you want a concurrent VM.
>
>
>
> Then do that.
>
>
>
> You need to install the executable code in an atomic way, so you need to
> suspend the threads while changing the executable permission into the
> writable permission. You may get away of this restriction if you are
> allowed to map the same physical memory into different virtual addresses
> ranges with different permissions (one writeable, and one executable).
>
>
>
> …as needed.
>
>
>
> Pony is changing quickly:  https://ponylang.zulipchat.com/.
>
>
>
> It’s being improved weekly.  The Pony group are also working on a security
> model (to deal with attacks via FFI and other sources), but this will be
> some time coming.  Feel free to contribute.  The language is highly
> moldable, especially at this stage.  I think the version is 0.33.  If you
> need a feature or convenience not present, request it.  The group is very
> responsive, and eager to improve the tool.
>
> If there were no Smalltalk, I would certainly use Pony before C, Rust, or
> Go, even at this early pre-1.0 stage.
>
>
>
> For these reasons, you need an unsafe language such as C/C++ for at least
> these tasks, or a language that allows you to turn off the type and memory
> safeness net. I heard that Rust has an unsafe pointer that you could also
> for these purposes.
>
>
>
> The Pony FFI will work here.  C libs are sometimes needed, and operations
> in C code are of course not guaranteed to be safe, in any case.
>
>
>
> The Pharo team is going with the existing virtual machine for quite a
> while. You cannot just replace something that is not well documented by
> something new, specially when you do not have that many resources for
> making a new vm.
>
>
>
> One of the main drivers in the choice of Pony is to reduce the resources
> needed to create a highly performant, simple VM.  One notable,
> Pharo/Smalltalk-related problem for high-speed apps is stop-the-world GCing
> in the one large system heap.  This won’t work for high-performance, highly
> concurrent, highly scalable apps, especially not for for real-time ones,
> and *must go away*, if latencies approaching deterministic are to be
> achieved.  Pony has already solved this problem with per-actor heaps.  That
> design feature is very interesting because it represents much hard work
> that need not be done.
>
>
>
>
>
> You first need to document completely the existing one, the semantics of
> the bytecodes, how they are implemented, and also the same with the
> primitives.
>
>
>
> Agreed.  I’m not claiming that VM development is easy or trivial,
> generally or via Pony.  VMs are arguably one of the most complicated things
> that humans create (*not a compliment*).  But Pony solves more
> concurrency-related problems at compile-time than any other available tool.
>
>
>
> How complete is the documentation on the current Pharo VM?
>
>
>
> As for myself, I am putting my bets on another language that I am
> developing (Sysmel: https://github.com/ronsaldo/sysmel ), and in full
> ahead-of-time compilation, but my problem domain is video game programming,
> low-level operating system, driver development and embedded programming
> where I actually want to have control of the machine.
>
>
>
> I’ve read some on Sysmel.  It appears to be an outstanding tool.
>
>
>
> Does Sysmel implement multicore concurrency, and guarantee no
> deadlocks/data-races on compile?
>
> Pony, the language and concurrency model, will not stop us from doing
> anything that can be done with the machine and OS.  The Pony language is
> not as interesting to me as its concurrency model.  Don’t like Pony syntax
> (and I don’t)?  Then fork and change it.  You have all the source.  (I’d
> prefer to see keyword selectors everywhere, even without the usual
> attendant polymorphism.  Seriously.)
>
>
>
> Since I have written my compiler in Pharo, I can just reuse the Opal
> Compiler for doing AST to AST translation and just compile Pharo (with some
> limitations, for example no thisContext) into my runtime environment. If I
> want a more dynamic environment, I can also serialize a Pharo
> CompiledMethod, send it through a socket and then interpret it on the
> Sysmel side:
> https://github.com/ronsaldo/sysmel/blob/master/module-sources/Sysmel.Core/Smalltalk-Bootstrap/InterpretedMethod.sysmel
> by just reusing the existing language semantics. Currently although I am
> just supporting Linux, and Windows support is coming in a couple of weeks
> after getting a proper module system working for reducing compilation
> times. For the backend I am using LLVM, wasm is not yet supported because I
> am generating some IR that are not supported by the wasm backend
>
>
>
> Wasm and WASI will be good when they are ready.
>
>
>
> (vtable layouts, and some intrinsics that I am using for non-local
> returns), but they should not be that complicated to fix.
>
>
>
> Can Sysmel manage many threads on many cores (running actor threads in
> parallel, not just green-thread-concurrently on one core), whilst
> guaranteeing no data-races, and switch automatically between actor-threads,
> without blocking (no wasted CPU cycles), in 5 to 15 ns?  Pony can do that
> now.
>
>
>
> If Sysmel cannot do those operations, or cannot do them as fast, can you
> add the abilities cost-effectively?  They already work in Pony, and a very
> active team drives the effort.
>
>
>
>
>
> Best,
>
>
>
> Shaping
>
>
>
>
>
>
>
>
>
> El lun., 6 abr. 2020 a las 6:05, Shaping (<shaping at uurda.org>) escribió:
>
> I should have initially posted this to the Pharo-dev list, as well.
>
>
>
> *From:* Pharo-users [mailto:pharo-users-bounces at lists.pharo.org] *On
> Behalf Of *Shaping
> *Sent:* Friday, 3 April, 2020 14:05
> *To:* 'Any question about pharo is welcome' <pharo-users at lists.pharo.org>
> *Subject:* Re: [Pharo-users] Latest PharoJS Success Story; Wasm/WASI;
> very keen on Pony for the Pharo VM
>
>
>
> All:
>
> > Brain Treats got stuck during launch on my LG.
>
> >
>
> Which android version are you using ?
>
>
>
> The phone is old and this is likely the problem.
>
>
>
> Android version:  4.4.2
>
> Kernel version:  3.4.0
>
>
>
> > Is there a plan to move PharoJS to Wasm/WASI?
>
> >
>
> Dave and I talked about it a long time ago. This sounds like a good idea.
>
> Actually, Dave has a very ambition idea = turn PharoJS into Pharo* where *
> can be different targets.
>
> But, there's a lot to do before reaching this goal. So, don't expect it
> any time soon.
>
>
>
> Not to change the topic too much, but the following is related and I often
> think of it…
>
>
>
> Consider writing the pharo VM in Wasm or, better, with *Pony* (which can
> emit Wasm, as needed).  Pony’s reference-capability-based (ref-cap)
> concurrency-model guarantees provably that no data-races or deadlocks can
> happen if the code compiles; this solves a very large class of extremely
> ugly concurrency problems that no one ever wants to face.
>
>
>
> Pony gives high-performance concurrency (5 to 15 ns actor-thread switching
> time, depending on platform), and solves the most difficult class of
> synchronization problems at compile time.  It runs as fast as C.  It runs
> faster than C, as concurrency scales.  You can’t scale a highly concurrent
> app efficiently in C, and really shouldn’t try if you wish to remain happy
> and mentally healthy.
>
>
>
> Pony is still pre-1.0, but the group is very active and competent.  I
> think we should consider using it to build the VM.  Have a look.  Some
> videos for your amusement and information:
>
>
>
>
>
> https://www.youtube.com/watch?v=ODBd9S1jV2s
>
> https://www.youtube.com/watch?v=u1JfYa413fY
>
> https://www.youtube.com/watch?v=fNdnr1MUXp8
>
> https://www.ponylang.io/
>
>
>
> There are many others.  I mentioned the Pony concurrency architecture
> around the holidays, but there was no interest from the list—not a good
> time perhaps.
>
>
>
> The tentative plan is to do what Google does with Flutter:  have the JIT
> in support of the usual dynamicity a Smalltalker needs for rapid
> development; and have AOT, fully optimized compiling for production or
> speed-related reality checks, presumably needed less often during
> development.  There are other possibilities.
>
>
>
> Anyone interested?
>
>
>
> I have some ideas for simplifying use of the six ref caps in the context
> of Pharo/Smalltalk.  If this path is chosen, one must commit to strict
> state-machine-based algorithm development, without exception.  This should
> have happened anyway by now, broadly in the programming space, but didn’t.
> I’m working on a programming graphical tool and associated grammar (in VW)
> that make state-machine development easy and attractive.  This , besides
> efficient use of machine resources, is the other reason for pushing in this
> direction.
>
>
>
> A Pony program is built from a net of asynchronously communicating
> actors.   You change the state of your program with asynchronous messaging
> between actors.  There is no blocking--no mutexes or semaphores—and
> therefore no wasted CPU cycles or mushrooming program complexity, as you
> try to use mutexes in a fine-grained way (a very bad idea).  And as
> mentioned, there are never deadlocks or data-races.  All cores on all CPUs
> stay busy, always, until the program goes idle or exits.  The Pony group is
> also working on extending the model to the network level, so that all
> machine nodes in the network stay busy.  In the round, as a start, think of
> Pony as Erlang/OTP, but much faster, with no legacy bugs, and provably
> no-deadlocking on compile.
>
>
>
> The asynchronous actor model is the programming pattern that Kay had in
> mind when he said “object-oriented.”  It’s the one I want to implement in
> Pharo.  The green threads are light, but don’t efficiently use the cores,
> and a net of VMs with their respective images still communicate too slowly.
>
>
>
> I your time permits, please study Pony for a bit, before rejecting the
> idea as too big a change in direction or too complicated.  Using Pony looks
> like the ideal VM simplification strategy, if our aim is efficient use of
> networks of machines, each with at least one CPU (often more), each, in
> turn, with many cores (whose numbers are still increasing).  This pattern
> in hardware probably won’t be changing much, now that speeds are topping
> out.  Winning the performance game is therefore about efficiently using
> many cores at once, *without burdening the programmer*.  I don’t see a
> better way to do this now than with Pony.
>
>
>
> Thoughts and suggestions are welcome.
>
>
>
>
>
> Shaping
>
>
>
>
>
>
>
>
>
> > -----Original Message-----
>
> > From: Pharo-users [mailto:pharo-users-bounces at lists.pharo.org
> <pharo-users-bounces at lists.pharo.org>] On
>
> > Behalf Of N. Bouraqadi
>
> > Sent: Tuesday, 28 January, 2020 12:18
>
> > To: Any question about pharo is welcome <pharo-users at lists.pharo.org>
>
> > Subject: [Pharo-users] Latest PharoJS Success Story
>
> >
>
> > The latest PharoJS-powered smartphone app is now live.
>
> > Development has been made using Pharo.
>
> > Then, javascript code is generated using PharoJS.
>
> > Last, the app is built to target both iOS and Android thanks to Apache
>
> > Cordova.
>
> >
>
> > Learn more and Download at
>
> > https://nootrix.com/projects/brain-treats-app/
>
> >
>
> > Noury
>
> >
>
> >
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200409/f8688cbb/attachment-0001.html>


More information about the Vm-dev mailing list