Multy-core CPUs

Tue Oct 23 20:52:55 UTC 2007

On 10/23/07, Peter William Lount <peter at smalltalk.org> wrote:
>
>  The principle is that anytime you have more than one thread or process
> working on the same memory space, or object space, you WILL have concurrency
> issues (unless your code is just running very simple concurrency). The point
> is that in order to implement your
> utopia-vision-of-simple-problem-free-concurrency
> (utopia-concurrencia for lack of a better name) in Smalltalk you MUST
> isolate the objects to ONLY ONE thread of possible alteration of their state
> otherwise you end up with the possibility of many classes of concurrency
> problems.

Yes, this is mostly true.  The insight with Erlang is that they don't
actually have to be in a different memory space, it just has to be
impossible at the language level for one process to get a reference to
an object of another process *and modify it*.

>Shared memory problems exist even within one protected memory
> space and not just between them. To isolate the objects involved in a
> process you can have a separate object space which contains the objects that
> will be operated on. This is the Erlang way, isn't it?

Kind of.  The Erlang approach works so well for them because variables
can't be changed.  Once you create a variable it is frozen in that
form.  Other process *can* look at it because no change can happen
from either thread.

Obviously more care will have to be taken in Smalltalk as the objects
can always be changed.

> The thing about
> Erlang, unless I'm mistaken (and if I am mistaken I'd expect to be
> corrected), is that the objects in a process are only visible to that
> process until the results are returned. The objects that pass in and out of
> an Erlang process are only primitive data types and not complex objects.

The last sentence is incorrect.  The message can be any complexity,
including sending functions, file handles, whatever.

> However for Smalltalk you'd need to pass in complex object graphs of
> arbitrary size and connectedness to be general purpose. This then results in
> a version problem.

Only if what you pass can be modified by either side.

>  For example, <snip>
> Now
> you've got a problem that the magical erlang message passing won't solve.

Problem: your example is using shared data and updating of variables.
In the message passing paradigm *there is no shared data*. Period.
None.  In Erlang specifically there isn't updating of variables even
within a process.  So this would be done in Erlang something like
this:

some_process(DataStructure) ->
  break_up_structure(DataStructure, 10000),
  get_new_structure({}, 10000).                           % return
result of get_new_structure

break_up_structure(_, 0) -> done;
% base case, no processes left
break_up_structure(DataStructure, Processes) ->              % otherwise
  RestOfDataStructure = split_and_send(DataStructure),     % cut off a
piece and send
  break_up_structure(RestOfDataStructure, Processes - 1). % tail call
with new values

get_new_structure(DataStructure, 0) -> DataStructure; % base case,
return what we built
get_new_structure(DataStructure, Processes) ->
  Data = receive,
          %psuedo code for brevity
  NewDataStructure = add_data_to_structure(Data, DataStructure),
  get_new_structure(NewDataStructure, Processes - 1).

The fact that variables are immutable is dealt with in the normal
functional programming way of using tail recursion and passing any
variables that need "updating" as arguments.

In case the above code isn't clear: The process breaks up the parts of
the data structure and farms them out to the different processes, then
waits for responses and incrementally assembles them into the new data
structure.

Now, the issue here is obviously:  This only makes sense when the
processing of the data that was carved out is more expensive then the
carving out and reattaching.  If the structure is very large that may
well not be the case.

In that case I'm not sure how I would handle it, but I look at it like
any other performance issue:  I would try algorithm changes before I
looked at going to a lower level.

>  Now someone mentioned Software Transactional Memory (STM) so briefly that
> it would be easy to miss. Is that your solution?

No, if someone else wants to look at this it's ok.  I'm a bit
concerned about the book keeping.

> If so you still have other
> concurrency issues, object versioning issues, plus more to deal with. No
> solution is a panacea for all problems unless you are an advocate of silver
> bullet solutions.

There is no such thing, but just as a generational garbage collector
is "good enough" in all but the most special cases, I believe message
passing will be "good enough" as well.

>  The problem of editing a large graph of objects with many parallel threads
> is the generalized case of a nasty and complex set of concurrency and
> transactional issues. There are many ways to solve this. If you reply to
> this example I would hope that you do so fully explaining how you'd handle
> the concurrency and - importantly - the object consistency issues.

Transactional and concurrency issues arise because you are sharing
something.  If you give one entity alone access to that something and
all access must go through him these issues go away.  They are traded
for new issues, but issues that are much easier to reason about.

>  Yes, I understand that early tests indicate that Erlang can handle
> approximately 100,000 or so processes at a time without hickups while Java
> can handle about 8,000 or so before blowing up.

No where near 8,000.  At least not on any box I've ever seen (or do
you have a reference?).  The problem is Java's just too fat, on a
32-bit operating system you run out of memory well before 8k processes
or threads.

> I don't know what the
> various Smalltalks can handle, but I doubt it's as high as Erlang and is
> more likely less than even Java - just a guess though. Maybe someone has
> worked it out.

Actually Smalltalk is not so far from Erlang right now (theoretically.
 The question mark is the scheduling).  Erlang is optimized for this
so the size of each process might be half the size of a Smalltalk one
(but I'm not sure of even this), but it's *certainly* much higher then
any native process or thread solution can hope to achieve.

>  That's only because the current crop of operating systems were designed and
> envisioned when a few hundred processes and threads was considered a lot.
> Also because native operating system processes take a lot of resources.

It's because of the resources and how the OS deals with them.  Keep in
mind that a thread can call "detach" and become a running process, so
some care has to be taken that space will be available.  Of course
linux deals with this by not having real threads at all, just
processes that have the same memory map as other processes.

>  Yes, and how would the no sharing be implemented in Smalltalk?

This is what my investigations will reveal.  As I alluded to in a
previous mail, any immutable data is not a concurrency issue.  It
doesn't matter who can see it so long as it can't be updated.  Mutable
data (e.g. objects) are also no issue provided you can guarantee no
process can get access to it besides the process that created it.

So that leaves globals, especially classes.  Until I get into this I'm
now 100% how I'll deal with it, but I can't image that it's not
solvable.

>  How would you solve the concurrency one million node editing problem above
> without locking in your utopian threading implementation?

As described above.

>  What would you do to Smalltalk to make it do this. So far you and the
> others have been very short on specifics and have just argued that something
> magical can be done to make concurrency happen without locks.

With current hardware/OSes, there will be locks, but in the VM where
they belong.  The only structure in Erlang that must be atomic is the
message "mailbox", it's the only place that should can be accessed at
the same time by multiple processes.

> A few papers
> and web sites have been linked to but no one has written down what they are
> proposing or what they mean past it can be done.

Well, I'm a Smalltalker.  I form a vague idea and then go try to do
it.  I'll let you know what the specification is when I've implemented
it. :)  But I have researched into this as far as what exists today
and I haven't seen anything I feel is a show stopper, nor anything
that will require a change in Smalltalk semantics.  It's very possible
(even likely) that there's something I've overlooked, but I'll need to
get into it to find that out.

>  I'll grant you that you can see that it can be done. Please illuminate what
> it is that you see can be done in detail and how you might do it. Thanks.

Is it clearer now?  I feel that I have detailed it out twice now (the
relevant details anyway).

>  However, you'll still end up with concurrency control issues and you've
> got an object version explosion problem occurring as well. How will you
> control concurrency problems with your simplified system? Is there a
> succinct description of the way that Erlang does it? Would that apply to
> Smalltalk?

Can you give an example of one of these issues, so I can explain how I
would deal with it?  Please note, there is *no data sharing, period*
in this paradigm.  At least at the language level.

>  Ok, so there would be 10,000 separate process-object-spaces with the one
> million nodes being edited and new nodes being created in each of these
> 10,000 separate spaces. How do you expect to "merge" the results and solve
> the edits that will inevitably cause "logical data inconsistency"
> collisions?

By having just one process that owns the data (or lots of processes
that own their own piece of it) that all processes must talk to if
they wish to make changes.

>  You simplified concurrency system also dramatically alters the Smalltalk
> paradigm.
>
>  The current paradigm is fine-grained locked/shared state.
>
>  So?

So obviously this part of the current paradigm will be altered, and I
say it needs to be.  Even if we find that certain parallel tasks need
the old shared state method, this shouldn't be provide anywhere most
people will find it.  The problem is that most people who know how to
do concurrency code only know this shared state model, so if you
present multiple options they will all use this, the familiar.

>  Why? Please provide more than anticidal or belief driven comments for this
> point of view. What are the reasons? What is it that you'd be moving
> towards?

Because of the reasons I've laid out several times in this thread:  1)
it does not scale, 2) it can not be composed, 3) it's incredibly
difficult to reason about, 4) it's a low level detail, 5) it ensures
encapsulation violation and on and on.

There are plenty of papers out there on this subject, if you are
looking for me to go through them all and condense it for you in a
summary more then I've already done then I'm afraid that's not going
to happen.  It's a pretty well known fact that shared-state fine
grained locking *can not scale*.

>  It's a huge mistake on their part in my humble view.
>
>  While it may be easy from the point of view of adapting their image it's a
> huge mistake. I've had many people comment that that's one of the reasons
> that Java is better than Smalltalk

If someone thinks that mess that is Java is better then Smalltalk, I
already question what useful information they can bring to the table.
Java has *some things* better then Smalltalk sure, but such a
statement is an "information smell" or a "taste smell" to say the
least.

> - it already works with multiple cpu
> cores. Yes they have to solve the concurrency problems, but those are NO
> WORSE than the concurrency problems that already exist within Smalltalk when
> running with a single native process and multiple (green threads aka)
> Smalltalk Processes. No different. Do you actually get that?

For someone who is so violently against personally attacks, you sure
hang over the fence, eh?  Just because you're not understanding where
I'm coming from doesn't mean these concepts are just beyond me and
only you get it.

> If you don't
> then you fail to appreciate that the approach that Cincom is taking isn't
> going to solve the concurrency problems since - unless they correct me on
> this - it seems that their direction is to simply have N-instances of their
> image (in the same memory space or in separate operating system processes)
> where N would frequently be the same as the number of cores on the computer
> (or server) in question (although the instances could be more or less as
> needed).

Which *does* solve it!  And conveniently walks right past all the
terrible issues that shared-state concurrency programming has.  Once
again, while Java people are trying to debug issues the Smalltalk guys
will already be adding features to the next release.

> Each individual image would still have the problems of
> multi-threading within it IF AND ONLY IF there are multiple threads forked.

Right, so don't do that. :)

>  This is of course a far cry from the radical concurrency system that is
> being proposed by the erlangization concurrency proponents.

Actually not so much.  Erlang spanned actual CPU's by running more
images, just like Smalltalk.  So only the processes inside the image
are different, but even this can be done today with discipline.  I
would like to remove this need for discipline by making it
*impossible* to affect other processes, but so long as you make sure
you don't update anything that other processes can see you could do
this kind of message passing today.

>  There isn't any need for new syntax with the "!" character. Now sure you're
> using it with a binary message selector "!" but why obfuscate it. I'd
> recommend using a keyword selector for better clarity. Thanks.

This is just what Erlang uses.  I want inter-process sends to stand out clearly.

>  Not so. You'd have to transmit - in my example above - one million objects
> to the various images and have them compute and return their resutls which
> would then have to be combined in a manner that leaves the graph of objects
> in a consistent state with one and a half million objects and 70% more
> interconnections between them. It is this parallel updating of many parts of
> the same data graph that will require the concurrency controls.

No.  You are describing shared data which doesn't exist.  No shared
data = no locking needed.

>  Nothing but you've got to address the concurrency problem that I've
> mentioned above.

It wasn't a problem with message passing style, but for shared-state
concurrency programming.

>  Are you talking about forking a new operating system process with a copy of
> the image?

I'm talking about:   [ "some code" ] fork

>  These are object database problems and attempting to split the processing
> into multiple threads to avoid the "locking" issues does not solve the
> problem. It just pushes it further away. While it might work for some
> applications like telephone switching systems it can't generalize to ALL
> types of problems which could benefit from concurrency solutions.

No it can't, and I don't believe I ever said it did.  But garbage
collection can't either and we do fine with that as our only option in
Squeak.  If we need more we step outside the normal bounds, as it
should be.

>  All Object Databases have a couple of rooted objects. Maybe many more than
> a couple.

Object databases are a whole other can of worms.  I don't know how I
would deal with it, but I would start by looking at what Mnesia
(basically an object db for Erlang) does.

>  Yes, a variant of the Software Transactional Memory. However, you still
> have the problems mentioned above.

No, Software transactional memory means we update several variables
inside an "automic" block and if the system notices something changed
while these changes were being made it rolls the block back to what it
was before.

I'm talking about a VM optimization to deal with metaclasses.

>  Having two spaces, old and new space, won't solve the problems mentioned
> above when you have N processes (threads) running on M-objects in parallel
> and need to combine the results of the parallel computations.

Old space and new space is purely for dealing with live code updates.
Nothing more.  I'm not trying to solve any object versioning issues,
because I haven't seen any real evidence they will exist.

>  Many problems have this "split processes off with their chunk of data" and
> "recombine" the results. Many of these problems are simplified - if possible
> - so that the results can't collide with the issues presented above.
> However, we are not talking about those special cases - such as parallel ray
> tracing algorithms. We are talking about the completely generic cases that
> occur in general purpose and every day use of code in Smalltalk applications

The only things I can think of that wouldn't work in this model is
problems where splitting up and rebuilding a dataset is more expensive
then the actual processing.  But I think can usually be solved by
design changes.

> - such as the massive Smalltalk business database front end applications
> which are typical at many corporations today and which utilize many threads
> to accomplish their parallel tasks in order to speed up the user experience.
> A real world consequence of this is increased productivity of thousands of
> users day in and day out at these corporations.

I'm not sure what you're saying here.  Apparently these Smalltalk
applications aren't doing real multithreading now right (since it's
only an option on a few ST implementations)?  So how is offering a
simple way to achieve concurrency going to make this worse?

>  Maybe your applications aren't a complex as these but I don't see the
> benefits of an Erlang ONLY approach. I do see the benefit of STM and Erlang
> approaches in some cases but why intentionally limit the tool box to just a
> few cases? It makes no sense to ignore the harsh reality of concurrency
> issues by picking a limited set of solutions.

For the reasons mentioned above.  Choice isn't the holy grail you seem
to think it is.  If it was we would all be on the C++ list talking.
Funny that we ended up in a language that 1) doesn't allow you to
allocate your own memory, 2) forces you to use single inheritance, 3)
forces you to use an image instead of files, etc.

I'm comfortable with the simplest thing being what works 90-99% the
time and having to work much harder if I need something more.