[Vm-dev] Pony for the Pharo VM

Thu Apr 9 05:42:16 UTC 2020

I have to admit that that language looks quite interesting, but it is not appropriate for writing an actual VM.

The point of the Pony language and concurrency model is to be able to write any program, even a VM, at scale and use all of the cores, with minimal programming effort (compared to what the developer must endure when explicitly managing synchronization issues).

One thing is having the actor object model for writing an application that scales where it can be a very desirable property, but another thing is writing a  virtual machine where you have time constraints for the just in time compiler in a single CPU to reduce initialization time.

A VM is a program.  You can write any program you want with Pony.  The heaps are already actor-centric and usually very tiny, just the way we need them for high-performance, real-time apps:

Orca: GC and Type System Co-Design for Actor Languages:

https://www.ponylang.io/media/papers/orca_gc_and_type_system_co-design_for_actor_languages.pdf

Can you be more specific about which program construct cannot be written in Pony or cannot be written to run fast enough?

You can write as much asynchronous and synchronous code as you wish, where and when you needed it.  The balance point is yours, as the developer, to choose.  

In fact, most of the methods are actually interpreted and they are only compiled after being executed several times for reducing this startup time.

Of course.  But this is not a problem peculiar to Pony (or C or Rust or…).  It’s just another programming task needed in the context of VM design.   Note the comment about using both JIT and AOT, as desired, during development.  In any case, if you use Pony, use a state-machine (which we all should do anyway).  If the developer cannot or will not develop the discipline needed to build state-machines, systematically, Pony or any highly concurrent, actor-based programming model will only thwart and frustrate him.  We can’t efficiently use all these cores without such a programming model.  

I think I would start by using the Pony compiler from a Smalltalk browser (mod the browser to accommodate the actors too).  Source code in files, searching, and scrolling are too inefficient.  

In this problem domain, the actor model and the thread safeness guaranteed by it does not help you at all, and deadlocks can always be produced.

This statement is incorrect in the case of Pony (but we have a terminology problem; see below), and this was the main reason for the post.  Here is the gist again:  Deadlocks and data-races are not possible in a Pony program that compiles.  This is mathematically guaranteed.  You can glean this fact from the videos or study the details in Pony papers.

 I did have deadlocks problems with synchronous messages in Erlang

Pony is not Erlang.  It is like Erlang is many ways, but vastly improved.  My qualified (“in the round, for starters”) comparison was a mistake.  I should have omitted the whole comment.  The Erlang/Pony comparison never goes well.  Pony is a different animal.  And it’s seriously powerful. Forget about Erlang.  Really.  Just forget about it.  Use what was learnt there, and implement it in Pony.  

which forced me to go into using asynchronous messages, but if you do not model your domain state machine you could also end with a deadlock by using asynchronous messages, or even worse, an inconsistent state such as an incomplete credit card transaction in a highly distributed system!

We are talking about different, but related things:  you can write a Pony program whose domain-level state-machine is wrong or unfinished and therefore not working correctly.  This is called livelocking.  It’s a domain-level problem, not a system problem (for Pony).  Livelocking is the developer’s problem, not Pony’s.  Compiled Pony code cannot deadlock/data-race.  It’s not possible.  Dealing with livelocking (programming a state machine thoroughly and correctly so that you get the behavior you want and describe in your code) is the subject of a special grammar and tool I’m working on for state-machine based programming.  This would supplement or replace the system browser.  Right now my approach to state-machine creating is a grammatical discipline that works very well for me.  I use it in Smalltalk, increasingly, and tend not to code without it anymore.  I don’t want to be limited to green threads, however, and definitely don’t want explicit concurrency management.  I don’t have that much time to waste, and hope everyone reading this has been burnt badly enough by concurrency bugs to have a similar view.  

You even need to model network and power failures in your state machine. So the programming language may help you a lot with you concurrent, distributed and fault tolerant system programming, but they are not a silver bullet that guarantees that your system is actually going to be correct.

See above concerning the difference between livelock and deadlock.  

Going back to the task of developing a VM, you also need to be able to perform dangerous memory accessing operations for at least the following three tasks:

1. Implementing the garbage collector.

Actually no; there is nothing dangerous here.  In Pony, a separate heap exists for each actor.  These are generally tiny, and come and go quickly.   If they are not tiny or at least very simple/uniform in structure, they should be made so by refactoring actor scope, until they are.  Smallness and clarity of purpose are the main criteria for determining whether you’ve written an actor well.  Those two properties also greatly ease debugging of the actor.  If you have a big actor not factored as a network of smaller ones, you’ve done something wrong, or you’ve just started your state machine, and have some factoring yet to do.  You still have classes, but these are an organizational tool for synchronous code used by Actors and their asynchronous behaviors.

2. Direct access to object slots for implementing the bytecode interpreter.

Not a problem.  It’s just a program feature.   So we write it as it needs to be.   

3. Copying compiled machine code into executable memory and performing position dependent relocations.

Doable.  The Pony FFI works well even at this early stage.

The machine code generation and installation can be separated in two stage (the current VM just generates the code directly into the executable memory), and in fact having these two separated stages for compilation and installation is a requirement for operating systems that enforce W^X page level permissions, specially if you want a concurrent VM.

Then do that.  

You need to install the executable code in an atomic way, so you need to suspend the threads while changing the executable permission into the writable permission. You may get away of this restriction if you are allowed to map the same physical memory into different virtual addresses ranges with different permissions (one writeable, and one executable).

…as needed.

Pony is changing quickly:  https://ponylang.zulipchat.com/.

It’s being improved weekly.  The Pony group are also working on a security model (to deal with attacks via FFI and other sources), but this will be some time coming.  Feel free to contribute.  The language is highly moldable, especially at this stage.  I think the version is 0.33.  If you need a feature or convenience not present, request it.  The group is very responsive, and eager to improve the tool.

If there were no Smalltalk, I would certainly use Pony before C, Rust, or Go, even at this early pre-1.0 stage. 

For these reasons, you need an unsafe language such as C/C++ for at least these tasks, or a language that allows you to turn off the type and memory safeness net. I heard that Rust has an unsafe pointer that you could also for these purposes.

The Pony FFI will work here.  C libs are sometimes needed, and operations in C code are of course not guaranteed to be safe, in any case.  

The Pharo team is going with the existing virtual machine for quite a while. You cannot just replace something that is not well documented by something new, specially when you do not have that many resources for making a new vm. 

One of the main drivers in the choice of Pony is to reduce the resources needed to create a highly performant, simple VM.  One notable, Pharo/Smalltalk-related problem for high-speed apps is stop-the-world GCing in the one large system heap.  This won’t work for high-performance, highly concurrent, highly scalable apps, especially not for for real-time ones, and must go away, if latencies approaching deterministic are to be achieved.  Pony has already solved this problem with per-actor heaps.  That design feature is very interesting because it represents much hard work that need not be done.  

You first need to document completely the existing one, the semantics of the bytecodes, how they are implemented, and also the same with the primitives.

Agreed.  I’m not claiming that VM development is easy or trivial, generally or via Pony.  VMs are arguably one of the most complicated things that humans create (not a compliment).  But Pony solves more concurrency-related problems at compile-time than any other available tool.

How complete is the documentation on the current Pharo VM?  

As for myself, I am putting my bets on another language that I am developing (Sysmel: https://github.com/ronsaldo/sysmel ), and in full ahead-of-time compilation, but my problem domain is video game programming, low-level operating system, driver development and embedded programming where I actually want to have control of the machine.

I’ve read some on Sysmel.  It appears to be an outstanding tool.

Does Sysmel implement multicore concurrency, and guarantee no deadlocks/data-races on compile?

Pony, the language and concurrency model, will not stop us from doing anything that can be done with the machine and OS.  The Pony language is not as interesting to me as its concurrency model.  Don’t like Pony syntax (and I don’t)?  Then fork and change it.  You have all the source.  (I’d prefer to see keyword selectors everywhere, even without the usual attendant polymorphism.  Seriously.)  

Since I have written my compiler in Pharo, I can just reuse the Opal Compiler for doing AST to AST translation and just compile Pharo (with some limitations, for example no thisContext) into my runtime environment. If I want a more dynamic environment, I can also serialize a Pharo CompiledMethod, send it through a socket and then interpret it on the Sysmel side: https://github.com/ronsaldo/sysmel/blob/master/module-sources/Sysmel.Core/Smalltalk-Bootstrap/InterpretedMethod.sysmel by just reusing the existing language semantics. Currently although I am just supporting Linux, and Windows support is coming in a couple of weeks after getting a proper module system working for reducing compilation times. For the backend I am using LLVM, wasm is not yet supported because I am generating some IR that are not supported by the wasm backend 

Wasm and WASI will be good when they are ready.

(vtable layouts, and some intrinsics that I am using for non-local returns), but they should not be that complicated to fix.

Can Sysmel manage many threads on many cores (running actor threads in parallel, not just green-thread-concurrently on one core), whilst guaranteeing no data-races, and switch automatically between actor-threads, without blocking (no wasted CPU cycles), in 5 to 15 ns?  Pony can do that now.

If Sysmel cannot do those operations, or cannot do them as fast, can you add the abilities cost-effectively?  They already work in Pony, and a very active team drives the effort.

Best,

Shaping

El lun., 6 abr. 2020 a las 6:05, Shaping (<shaping at uurda.org <mailto:shaping at uurda.org> >) escribió:

I should have initially posted this to the Pharo-dev list, as well.

From: Pharo-users [mailto:pharo-users-bounces at lists.pharo.org <mailto:pharo-users-bounces at lists.pharo.org> ] On Behalf Of Shaping
Sent: Friday, 3 April, 2020 14:05
To: 'Any question about pharo is welcome' <pharo-users at lists.pharo.org <mailto:pharo-users at lists.pharo.org> >
Subject: Re: [Pharo-users] Latest PharoJS Success Story; Wasm/WASI; very keen on Pony for the Pharo VM

All:

> Brain Treats got stuck during launch on my LG.

> 

Which android version are you using ?

The phone is old and this is likely the problem.

Android version:  4.4.2

Kernel version:  3.4.0

> Is there a plan to move PharoJS to Wasm/WASI?

> 

Dave and I talked about it a long time ago. This sounds like a good idea.

Actually, Dave has a very ambition idea = turn PharoJS into Pharo* where * can be different targets.

But, there's a lot to do before reaching this goal. So, don't expect it any time soon.

Not to change the topic too much, but the following is related and I often think of it…

Consider writing the pharo VM in Wasm or, better, with Pony (which can emit Wasm, as needed).  Pony’s reference-capability-based (ref-cap) concurrency-model guarantees provably that no data-races or deadlocks can happen if the code compiles; this solves a very large class of extremely ugly concurrency problems that no one ever wants to face. 

Pony gives high-performance concurrency (5 to 15 ns actor-thread switching time, depending on platform), and solves the most difficult class of synchronization problems at compile time.  It runs as fast as C.  It runs faster than C, as concurrency scales.  You can’t scale a highly concurrent app efficiently in C, and really shouldn’t try if you wish to remain happy and mentally healthy.

Pony is still pre-1.0, but the group is very active and competent.  I think we should consider using it to build the VM.  Have a look.  Some videos for your amusement and information:

https://www.youtube.com/watch?v=ODBd9S1jV2s

https://www.youtube.com/watch?v=u1JfYa413fY

https://www.youtube.com/watch?v=fNdnr1MUXp8

https://www.ponylang.io/

There are many others.  I mentioned the Pony concurrency architecture around the holidays, but there was no interest from the list—not a good time perhaps.

The tentative plan is to do what Google does with Flutter:  have the JIT in support of the usual dynamicity a Smalltalker needs for rapid development; and have AOT, fully optimized compiling for production or speed-related reality checks, presumably needed less often during development.  There are other possibilities.  

Anyone interested?   

I have some ideas for simplifying use of the six ref caps in the context of Pharo/Smalltalk.  If this path is chosen, one must commit to strict state-machine-based algorithm development, without exception.  This should have happened anyway by now, broadly in the programming space, but didn’t.  I’m working on a programming graphical tool and associated grammar (in VW) that make state-machine development easy and attractive.  This , besides efficient use of machine resources, is the other reason for pushing in this direction.  

A Pony program is built from a net of asynchronously communicating actors.   You change the state of your program with asynchronous messaging between actors.  There is no blocking--no mutexes or semaphores—and therefore no wasted CPU cycles or mushrooming program complexity, as you try to use mutexes in a fine-grained way (a very bad idea).  And as mentioned, there are never deadlocks or data-races.  All cores on all CPUs stay busy, always, until the program goes idle or exits.  The Pony group is also working on extending the model to the network level, so that all machine nodes in the network stay busy.  In the round, as a start, think of Pony as Erlang/OTP, but much faster, with no legacy bugs, and provably no-deadlocking on compile. 

The asynchronous actor model is the programming pattern that Kay had in mind when he said “object-oriented.”  It’s the one I want to implement in Pharo.  The green threads are light, but don’t efficiently use the cores, and a net of VMs with their respective images still communicate too slowly.

I your time permits, please study Pony for a bit, before rejecting the idea as too big a change in direction or too complicated.  Using Pony looks like the ideal VM simplification strategy, if our aim is efficient use of networks of machines, each with at least one CPU (often more), each, in turn, with many cores (whose numbers are still increasing).  This pattern in hardware probably won’t be changing much, now that speeds are topping out.  Winning the performance game is therefore about efficiently using many cores at once, without burdening the programmer.  I don’t see a better way to do this now than with Pony.

Thoughts and suggestions are welcome.

Shaping

> -----Original Message-----

> From: Pharo-users [ <mailto:pharo-users-bounces at lists.pharo.org> mailto:pharo-users-bounces at lists.pharo.org] On 

> Behalf Of N. Bouraqadi

> Sent: Tuesday, 28 January, 2020 12:18

> To: Any question about pharo is welcome < <mailto:pharo-users at lists.pharo.org> pharo-users at lists.pharo.org>

> Subject: [Pharo-users] Latest PharoJS Success Story

> 

> The latest PharoJS-powered smartphone app is now live.

> Development has been made using Pharo.

> Then, javascript code is generated using PharoJS.

> Last, the app is built to target both iOS and Android thanks to Apache 

> Cordova.

> 

> Learn more and Download at

>  <https://nootrix.com/projects/brain-treats-app/> https://nootrix.com/projects/brain-treats-app/

> 

> Noury

> 

> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200409/6c474426/attachment-0001.html>