Multy-core CPUs

Thu Oct 25 21:15:03 UTC 2007

  _____  

De: squeak-dev-bounces at lists.squeakfoundation.org
[mailto:squeak-dev-bounces at lists.squeakfoundation.org] En nombre de Peter
William Lount
Enviado el: Jueves, 25 de Octubre de 2007 16:29
Para: The general-purpose Squeak developers list
Asunto: Re: Multy-core CPUs

Hi,

Sebastian Sastre wrote: 

hi,

What? That just won't work. Think of the memory overhead. 

I don't give credit to unfounded apriorisms. I think it deserves to be
proved that does not work. Anyway let's just assume that may be too much for
state of the art hardware in common computers in year 2007. What about in
2009? what about in 2012? Remember the attitude you had saying this now the
first day of 2012.

It's not an unfounded apriorism as you put it. 

Current hardware and technology expected in the next ten years isn't
optimized for N hundred thousand or N million threads of execution. Maybe in
the future that will be the case. 

The Tile-64 processor is expected to grow to about 4096 processors by
pushing the limits of technology beyond what they are today. To reach the
levels you are talking about for a current Smalltalk image with millions of
objects each having their own thread (or process) isn't going to happen
anytime soon. 

I work with real hardware.

I am open and willing to be pleasantly surprised however. 

Peter.. Peter.. you have to fight a little harder to that demon. Look I
asked you to read my previous post with subject "One Process Per Instance"
where I taked the time (so money) of explainly as didactic as I can how
*your* millon object example could be managed in a system like the one I'm
speculating with. So please, please Peter, I ask you not to make repeat
myself go read it and make your statements there if you found them. As I
already said I think the experiences you are sharing in this matter are
precious so discussion gets just richer.

Tying an object instance to a particular process makes no sense. If you did
that you'd likely end up with just as many dead locks and other concurrency
problems since you'd now have message sends to the object being queued up on
the processes input queue. Since processes could only process on message at
a time deadlocks can occur - plus all kinds of nasty problems resulting from
the order of messages in the queue. (There is a similar nasty problem with
the GUI event processing in VisualAge Smalltalk that leads to very difficult
to diagnose and comprehend concurrency problems). It's a rats maze that's
best to avoid.

Besides, in some cases an object with multiple threads could respond to many
messages - literally - at the same time given multiple cores. Why slow down
the system by putting all the messages into a single queue when you don't
have to!?

You didn't understand the model I'm talking about. 

That is likely the case.  

So I ask you kindly if you can read my previus emails where I where I have
taken the job of expresing my exploratory thoughts until reached this model
and the speculation about the existence of this model (consequences). 

There isn't such a thing as an object with multiple trheads. That does not
exists in this model. 

Ok. I got that. 

It does exists one process per instance no more no less. 

I did get that. Even if you only do that logically you've got serious
problems. 

If you have read where I talk about how to manage with this model N millon
objects with limited hardware and you still found problems please be my
guest to inform me here because I want to know that as soon as possible.

I think you're thinking about processes and threads the same way you know
them today. 

I can easily see such a scenario working and also breaking all over the
place. 

Why? 

Lets see if this helps you to get the idea: Desambiguation: for this model
I'm talking about process not as an OS process but as a VM light process
which we also use to call them threads.

Ok.

So I'm saying that in this model you have only one process per instance but
that process is not a process that can have threads belonging to it. 

ok.

That generates a hell of complexity. 

You lost me there. What complexity?

Does not matter is other model not the one I'm speculating. (probably one
you have imagined before clarifiying the 1:1 object process thing) 

The process I'm saying it's tied to an instance it's more close to the
process word you know from dictionary plus what you know what an instance is
and with the process implemented by a VM that can balance it across cores.

I didn't understand. Please restate.  

I restates that N times in my previus emails being too long. To give you a
clue it's about the double nature I'm saying the object has. An amalgam
between object and process. It's conceptual indissociability. More on those
previus emails. 

I'm not falling in the pitfall of start trying to parallelize code
automagically. This far from it. In fact I think this is better than that
illusion. Every message is guaranteed by VM to reach it's destination in
guaranteed order. Otherwise will be chaos. And we want an ordered chaos like
the one we have now in a Squeak reified image.

Yes, squeak is ordered chaos. ;--). 

Clarified that I ask why do you think could be deadlocks? and what other
kind of concurrency problems do you think that will this model suffer?

If a number of messages are waiting in the input queue of a process that can
only process one message at a time since it's not multi-threaded then those
messages are BLOCKED while in the thread. Now imagine another process with
messages in it's queue that are also BLOCKED since they are waiting in the
queue and only one message can be processed at a time. Now imagine that
process A and process B each have messages that the other needs before it
can proceed but those messages are BLOCKED waiting for processing in the
queues. 

This is a real example of what can happen with message queues. The order
isn't guaranteed. Simple concurrency solutions often have deadlock
scenarios. This can occur when objects must synchronize events or
information. As soon as you have multiple threads of execution you've got
problems that need solving regardless of the concurrency model in place. 

 But that can happen right now if you give a bad use of process in a current
Smalltalk. I don't want to solve deadlocks for anybody using parallelism
badly. I just want a Smalltalk that works like todays but balancing cpu load
across cores and scaling to an arbitrary number of them. All this trhead
it's about that.

Tying an object's life time to the lifetime of a process doesn't make sense
since there could be references to the object all over the place. If the
process quits the object should still be alive IF there are still references
to it.

You'd need to pass around more than references to processes. For if a
process has more than one object you'd not get the resolution you'd need.
No, passing object references around is way better.

Yes of course there will be. In this system a process termination is one of
two things: A) that instance is being reclaimed in a garbage collection or
B) that instance has been written to disk in a kind of hibernation that can
be reified again on demand.  Please refer to my previous post with subject
"One Process Per Instance.." where I talk more about exacly this. 

If all there is is a one object per process and one process per object - a 1
to 1 mapping then yes gc would work that way but the 1 to 1 mapping isn't
likely to ever happen given current and future hardware prospects. 

 But Peter don't lower your guard on that so easy! we know techniques to
administer resources like navegating 10 thousand instances at the time a 10
gigas image of 10 million objects! don't shoot hope before it borns! I talk
some details I've imagined about this in my "One Process Per Instance" post.

Even if you considered an object as having it's own "logical" process you'd
get into the queuing problems hinted at above.

Which I dont see and I ask your help to understand if you still find them
after the clarifications made about the model.

See the example above. 

Besides objects in Smalltalk are really fine grained. The notion that each
object would have it's own thread would require so much thread switching
that no current processor could handle that. It would also be a huge waste
of resources.

And what do you think was going out of the mouths of criticizers of the
initiatives like the park place team had in 1970's making a Smalltalk with
the price of the CPU's and RAM at that time? that VM's are a smart efficient
use of resources?

That's not really relevant. If you want to build that please go ahead -
please don't let me stop you, that's the last thing I'd want. I wish you
luck. I get to play with current hardware and hardware that's coming down
the pipe such as the Tile-64 or the newest GPUs when they are available to
the wider market.

We all have to use cheap hardware. Please (re)think about what I said about
administering hardware resources over this model. 

So I copy paste myself: "I don't give credit to unfounded apriorisms. It
deserves to be proven that does not work. Anyway let's just assume that may
be too much for state of the art hardware in common computers in year 2007.
What about in 2009? what about in 2012?"

Well just get out your calculator. There is an overhead to a thread or
process in bytes. Say 512 bytes per thread plus it's stack. There is the
number of objects. Say 1 million for a medium to small image. Now multiply
those and you get 1/2 gigabyte. Oh, we forgot the stack space and the memory
for the objects themselves. Add a multiplier for that, say 8 and you get 4
gigabytes. Oh, wait we forgot that the stack is kind weird since as each
message send that isn't to self must be an interprocess or interthread
message send you've got some weirdness going on let along all the thread
context switching for each message send that occurs. Then you've got to add
more for who knows what... the list could go on for quite a bit. It's just
mind boggling. 

I just can't beleive we really can't find clever ways of adminiter resources
to the point in which this becomes acceptable. 

Simply put current cpu architectures are simply not designed for that
approach. Heck they are even highly incompatible with dynamic message
passing since they favor static code in terms of optimizations. 

 Yes that happens with machines based on mathematic models like the boolean
model. It injects an inpedance mismatch between the conceptual modeling and
the virtual modeling. 

Again, one solution does not fit all problems - if it did programming would
be easier.

But programming should have to be easier. 

Yes, I concur, whenever it's possible to do so. But it also shouldn't ignore
the hard problems either. 

Smalltalk made it easier in a lot of aspects. 

Sure I concur. That's why I am working here in this group spending time (is
money) on these emails. 

Listen.. I'm not a naif silver bullet purchaser nor a faithful person. I'm a
critic Smalltalker that thinks he gets the point about OOP and tries to find
solutions to surpass the multicore crisis by getting an empowered system not
consoling itself with a weaker one. 

I do get that about you. 

Peter please try to forget about how systems are made and think in how you
want to make them.

I do think about how I want to make them. However to make them I have no
choice but to consider how to actually build them using existing
technologies and the coming future technologies. 

Currently we have 2-core and 4-core processors as the mainstream with 3-core
and 8-core coming to a computer store near you. We have the current crop of
GPUs from NVidia that have 128 processing units that can be programmed in a
variant of C for some general purpose program tasks using a SIMD (single
instruction multiple data) format - very useful for those number crunching
applications like graphics, cryptology and numeric analysis to name just a
few. We also have the general purpose networked Tile-64 coming - lots of
general purpose compute power with an equal amount of scalable networked IO
power - very impressive. Intel even has a prototype with 80-cores that is
similar. Intel also has it's awesomely impressive Itanium processor with
instruction level parallelism as well as multiple cores - just wait till
that's a 128 core beastie. Please there is hardware that we likely don't
know about or that hasn't been invented yet. Please bring it on!!!

The bigger problem is that in order to build real systems I need to think
about how they are constructed. 

So yes, I want easy parallel computing but it's just a harsh reality that
concurrency, synchronization, distributed processing, and other advanced
topics are not always easy or possible to simplify as much as we try to want
them to be. That is the nature of computers. 

Sorry for being a visionary-realist. Sorry if I've sounded like the critic.
I don't mean to be the critic that kills your dreams - if I've done that I
apologize. I've simply meant to be the realist who informs the visionary
that certain adjustments are needed.

All the best,

Peter

 Please take your time to think about what I've stated of administering
resources being possible to manage load of millions of instances by a swarm
of a few at the time. And don't be sorry of anything. I love criticism. Our
culture need tons of criticism to be stronger. It's the only way we can
unistall deprecated or obsolete ideas. You are helping here. If I really
dreaming an this don't work I want that dream to be kill now so I can spend
my time in something better. That helps. 

 By  now this model it's just getting stronger. Please try to get it down
!!!   :)))

    cheers,

Sebastian 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20071025/0bd01f68/attachment.htm