<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Jason Johnson wrote:

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">On 10/21/07, Peter William Lount <a

 class="moz-txt-link-rfc2396E" href="mailto:peter@smalltalk.org">&lt;peter@smalltalk.org&gt;</a> wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">tim Rowledge wrote:

Ok, so if you really are talking about a "strict" Erlang style model

with ONE Smalltalk process per "image" space (whether or not they are in

one protected memory space or many protected memory spaces) where

objects are not shared with any other threads except by copying them

over the "serialization wire" or by "reference" then I get what you are

talking about.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

That is a strange way of putting it.  </pre>

</blockquote>

<br>

Why? That is what Erlang achieves via it's total encapsulation of state

that is only transferred by message passing to and back from a process.

To achieve the same thing in Smalltalk you'd need to isolate the

component objects running in an "image" object space with the process

otherwise you'd be breaking the encapsulation that provides the

protection against a large number of class es of concurrency problems. <br>

<br>

The principle is that anytime you have more than one thread or process

working on the same memory space, or object space, you WILL have

concurrency issues (unless your code is just running very simple

concurrency). The point is that in order to implement your

utopia-vision-of-simple-problem-free-concurrency (utopia-concurrencia

for lack of a better name) in Smalltalk you MUST isolate the objects to

ONLY ONE thread of possible alteration of their state otherwise you end

up with the possibility of many classes of concurrency problems. Shared

memory problems exist even within one protected memory space and not

just between them. To isolate the objects involved in a process you can

have a separate object space which contains the objects that will be

operated on. This is the Erlang way, isn't it? The thing about Erlang,

unless I'm mistaken (and if I am mistaken I'd expect to be corrected),

is that the objects in a process are only visible to that process until

the results are returned. The objects that pass in and out of an Erlang

process are only primitive data types and not complex objects. However

for Smalltalk you'd need to pass in complex object graphs of arbitrary

size and connectedness to be general purpose. This then results in a

version problem. <br>

<br>

For example, lets say that you have a graph of one million objects that

is highly connected and you want to perform not just a simple read

operation on it but a massive number of edits which would result in the

graph growing by 50% and the number of connections growing by 70%. For

speed you decide to implement the algorithms so that they can run in

parallel upon this moderately large graph of objects. Lets say that you

have enough compute and memory resources to split this into 10,000

processes. Now you have the problem of sharing the one million objects

with the 10,000 processes. That's a lot of data to move around just to

get things started assuming that you packaged up the whole mess into a

serial blob and spit it at the various processes. A lot of redundant

data. Ok, maybe it's better to do this in small chunks, after all

incrementalism is a powerful technique. For this approach you send each

of the 10,000 processes a starting node plus a "search pattern" and the

type of edits it will perform upon the graph along with the actual

edits as they flow in from another source. So now you have 10,000

processes each vying to traverse the one million node graph scanning

for patterns and applying edits as they find what they are looking for.

Some of these processes will then update the "shared graph". Oh. What

happens when two processes both update the same node in this graph but

in different ways? Let's say one edit in one process adds a connection

while the other edit in the other node modifies an instance variable on

that node? Let's say that these two edits occur at the same time and

are mutually exclusive - that is both edits would break the object's

own internal consistency rules. So now you have two edits that either

must both fail, or one must succeed while the other fails or the other

must succeed - both can't succeed. Now you've got a problem that the

magical erlang message passing won't solve. <br>

<br>

If it does what is the erlang solution to this million node parallel

editing problem? <br>

<br>

Now someone mentioned Software Transactional Memory (STM) so briefly

that it would be easy to miss. Is that your solution? If so you still

have other concurrency issues, object versioning issues, plus more to

deal with. No solution is a panacea for all problems unless you are an

advocate of silver bullet solutions.<br>

<br>

The problem of editing a large graph of objects with many parallel

threads is the generalized case of a nasty and complex set of

concurrency and transactional issues. There are many ways to solve

this. If you reply to this example I would hope that you do so fully

explaining how you'd handle the concurrency and - importantly - the

object consistency issues. <br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">The fact is, Erlang has many processes per image.  </pre>

</blockquote>

<br>

Yes, I understand that early tests indicate that Erlang can handle

approximately 100,000 or so processes at a time without hickups while

Java can handle about 8,000 or so before blowing up. I don't know what

the various Smalltalks can handle, but I doubt it's as high as Erlang

and is more likely less than even Java - just a guess though. Maybe

someone has worked it out. <br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">Many more then you could ever get as real

processes or native threads (as a test I made a little program that

spawned 64 *thousand* threads and passed messages between them on my

laptop).

  </pre>

</blockquote>

<br>

That's only because the current crop of operating systems were designed

and envisioned when a few hundred processes and threads was considered

a lot. Also because native operating system processes take a lot of

resources.<br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">But with their model, process creation is extremely cheap.  And since

there is no sharing as far as the language is concerned, there is no

need for locking to slow everything down.

  </pre>

</blockquote>

<br>

Yes, and how would the no sharing be implemented in Smalltalk? <br>

<br>

How would you solve the concurrency one million node editing problem

above without locking in your utopian threading implementation?<br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">Smalltalk can do this too. I think it needs a little work still, but

I'm optimistic about what can be done here.

  </pre>

</blockquote>

<br>

What would you do to Smalltalk to make it do this. So far you and the

others have been very short on specifics and have just argued that

something magical can be done to make concurrency happen without locks.

A few papers and web sites have been linked to but no one has written

down what they are proposing or what they mean past it can be done. <br>

<br>

I'll grant you that you can see that it can be done. Please illuminate

what it is that you see can be done in detail and how you might do it.

Thanks.<br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <blockquote type="cite">

    <pre wrap="">However, you'll still end up with concurrency control issues and you've

got an object version explosion problem occurring as well. How will you

control concurrency problems with your simplified system? Is there a

succinct description of the way that Erlang does it? Would that apply to

Smalltalk?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Much like how Smalltalk does it, as it turns out.  That is, you don't

have a version problem so much as you have "old" and "new".  So when

ready you send the "upgrade" message to  the system and all new calls

to the main functions of a process will be the new version.  All

currently running code will access the old code until it's completion,

and all new code runs in the new space.

  </pre>

</blockquote>

<br>

Ok, so there would be 10,000 separate process-object-spaces with the

one million nodes being edited and new nodes being created in each of

these 10,000 separate spaces. How do you expect to "merge" the results

and solve the edits that will inevitably cause "logical data

inconsistency" collisions?<br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <blockquote type="cite">

    <pre wrap="">You simplified concurrency system also dramatically alters the Smalltalk

paradigm.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

The current paradigm is fine-grained locked/shared state. </pre>

</blockquote>

<br>

So?<br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap=""> In my opinion and the opinion of many (probably most in fact, outside of the

Java community) people who are more expert is this area then you or I,

we *have* to move away from this paradigm.

  </pre>

</blockquote>

<br>

Why? Please provide more than anticidal or belief driven comments for

this point of view. What are the reasons? What is it that you'd be

moving towards?<br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">  </pre>

  <blockquote type="cite">

    <pre wrap="">Is this the approach that Cincom is using in their Visual Works system?

They seem to not be embracing the notion of native threads.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Thank God. :)

  </pre>

</blockquote>

<br>

It's a huge mistake on their part in my humble view. <br>

<br>

While it may be easy from the point of view of adapting their image

it's a huge mistake. I've had many people comment that that's one of

the reasons that Java is better than Smalltalk - it already works with

multiple cpu cores. Yes they have to solve the concurrency problems,

but those are NO WORSE than the concurrency problems that already exist

within Smalltalk when running with a single native process and multiple

(green threads aka) Smalltalk Processes. No different. Do you actually

get that? If you don't then you fail to appreciate that the approach

that Cincom is taking isn't going to solve the concurrency problems

since - unless they correct me on this - it seems that their direction

is to simply have N-instances of their image (in the same memory space

or in separate operating system processes) where N would frequently be

the same as the number of cores on the computer (or server) in question

(although the instances could be more or less as needed). Each

individual image would still have the problems of multi-threading

within it IF AND ONLY IF there are multiple threads forked. Then you

have all the same concurrency problems that happen with multiple

threads on objects in one memory space. Sure this is a simpler approach

for them as they don't have to completely toss their current virtual

machine design - they can hack it by simply using one image space per

native processor or per native operating system process. Then all they

need is a cheap and dirty distributed object transport system to move

object graphs (complete or partial) around between the various images.

This will work for them and ALL Smalltalk systems including Squeak. In

fact this can work now essentially with unmodified Smalltalk systems -

all that's reallly needed is the distributed objects framework and

there are a few of those kicking around. <br>

<br>

This is of course a far cry from the radical concurrency system that is

being proposed by the erlangization concurrency proponents.<br>

<br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">  </pre>

  <blockquote type="cite">

    <pre wrap="">However it's

also unlikely that they are embracing the notion of only ONE Smalltalk

process per image either.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

If I understand you correctly, then I would suggest not to use the

word "image" as this is confusing.  Another way to put it would be

"each process has it's own view of the world".  And honestly, what is

the problem you see with this?

  </pre>

</blockquote>

<br>

Ok. How will you implement that?<br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">Right now, if you run two separate images with only one thread or

process, then you have two processes that each have their own set of

objects in their own space interacting with each other.

  </pre>

</blockquote>

<br>

Yes, exactly. This is the illusion that Erlang provides. This can also

be achieved now with ANY Smalltalk version just by starting multiple

images - one for each core if you want to map them that way as may be

"natural" to want to do.<br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">Now we add a way for one image to send a message *between* images.

  </pre>

</blockquote>

<br>

Yes. That can be done now. <br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">Perhaps the VM can detect when we are trying to do this, but instead

of complicating the default Smalltalk message sending subsystem, lets

make it explicit with some special binary message:

Processes at: 'value computer' ! computeValue.

  </pre>

</blockquote>

<br>

There isn't any need for new syntax with the "!" character. Now sure

you're using it with a binary message selector "!" but why obfuscate

it. I'd recommend using a keyword selector for better clarity. Thanks. <br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">Now we have the ability to send messages locally within a process, and

a way of freely sending between processes.  No locking and the

problems associated with locking.

  </pre>

</blockquote>

<br>

Not so. You'd have to transmit - in my example above - one million

objects to the various images and have them compute and return their

resutls which would then have to be combined in a manner that leaves

the graph of objects in a consistent state with one and a half million

objects and 70% more interconnections between them. It is this parallel

updating of many parts of the same data graph that will require the

concurrency controls.<br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">So, now what is stopping us from moving this separate process *inside

the same image*?  </pre>

</blockquote>

<br>

Nothing but you've got to address the concurrency problem that I've

mentioned above. <br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">If you fork a process and he starts making objects,

no other processes have references to those objects.  No shared state

issue there.  This part could work right now today with no changes to

the VM.

  </pre>

</blockquote>

<br>

Are you talking about forking a new operating system process with a

copy of the image? The "copied" objects or the objects that were in the

"image" to begin with are "duplicates" (or N-plicates really) which is

a real headache if they get modified in multiple images and need to be

"recombined" into one real persistent state. <br>

<br>

These are object database problems and attempting to split the

processing into multiple threads to avoid the "locking" issues does not

solve the problem. It just pushes it further away. While it might work

for some applications like telephone switching systems it can't

generalize to ALL types of problems which could benefit from

concurrency solutions. That's wishful thinking and a pipe dream

otherwise known as a silver bullet. <br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">The only issue I can think of are globals, </pre>

</blockquote>

<br>

All Object Databases have a couple of rooted objects. Maybe many more

than a couple. <br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">the most obvious being

class side variables.  Note that even classes themselves are not an

issue because without class side variables, they are effect free

(well, obviously basicNew would have to be looked at).

  </pre>

</blockquote>

<br>

I'm not sure what you mean. <br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">But I think this issue is solvable.  The VM could take a "copy on

write" approach on classes/globals.  That is, a class should be side

effect free (to itself, i.e. it's the same after every call), so let

all processes share the memory space where meta-class objects live.

But as soon as any process tries to modify the class in some way

(literally, it would be the class modifying itself), he gets his own

copy.  Processes must not see changes made by other processes, so a

modification to a global class is a "local only" change.

  </pre>

</blockquote>

<br>

Yes, a variant of the Software Transactional Memory. However, you still

have the problems mentioned above. <br>

<br>

<br>

<blockquote

 cite="mid:aa22f0200710210307r89cdefem4ae355462df02369@mail.gmail.com"

 type="cite">

  <pre wrap="">Of course the only big thing left would be; what happens when we add a

new class.  But Erlang has had success with the old/new space

approach, and what Smalltalk has now is very similar.

  </pre>

</blockquote>

<br>

Having two spaces, old and new space, won't solve the problems

mentioned above when you have N processes (threads) running on

M-objects in parallel and need to combine the results of the parallel

computations. <br>

<br>

Many problems have this "split processes off with their chunk of data"

and "recombine" the results. Many of these problems are simplified - if

possible - so that the results can't collide with the issues presented

above. However, we are not talking about those special cases - such as

parallel ray tracing algorithms. We are talking about the completely

generic cases that occur in general purpose and every day use of code

in Smalltalk applications - such as the massive Smalltalk business

database front end applications which are typical at many corporations

today and which utilize many threads to accomplish their parallel tasks

in order to speed up the user experience. A real world consequence of

this is increased productivity of thousands of users day in and day out

at these corporations.<br>

<br>

Maybe your applications aren't a complex as these but I don't see the

benefits of an Erlang ONLY approach. I do see the benefit of STM and

Erlang approaches in some cases but why intentionally limit the tool

box to just a few cases? It makes no sense to ignore the harsh reality

of concurrency issues by picking a limited set of solutions.<br>

<br>

All the best,<br>

<br>

Peter William Lount<br>

<a class="moz-txt-link-abbreviated" href="mailto:Peter@smalltalk.org">Peter@smalltalk.org</a><br>

<br>

</body>

</html>