[squeak-dev] Prepare for Thousands of Cores --- oh my Chip - it's full of cores!

Igor Stasenko siguctua at gmail.com
Sun Jul 6 17:09:21 UTC 2008


2008/7/6 Joshua Gargus <schwa at fastmail.us>:
>
> On Jul 5, 2008, at 6:40 PM, Peter William Lount wrote:
>
>> Todd Blanchard wrote:
>>>
>>> This was pretty much the messages from Apple at WWDC recently as well.
>>> Their next os version has several technologies based around this idea.
>>> The shift is upon us.
>>>
>>
>> Yeah, Apple is talking about two different approaches - program
>> parallelism with multi-cores and data parallelism with GPGPUs from the likes
>> of NVidia and AMD-ATI or possibly P.A.Semi (just a wild guess on P.A.Semi as
>> their chips could be made with many many cores soon).
>>
>> And NO Smalltalk hasn't caught up yet. Just half a year ago in this very
>> forum thread people were arguing against generic fully multi-threading of
>> Smalltalk virtual machines. Cincom is against it. Instantiantions has been
>> quite and likely won't do much.
>
> And in my opinion, the people who were arguing against it won the argument.
>  Concerns were raised about the cache-thrashing that could result, and
> relevant empirical research was linked to that seemed to validate these
> concerns.
>
>> Only a few brave intrepid explorers get it and now we have experiments
>> like HydraVM for croquet/squeak.
>
> Perhaps I misunderstood what you meant in the previous part of the
> paragraph.  Hydra is explicitly one-thread-per-image for 1) simplicity of
> implementation, 2) simplicity of use and 3) because many-threads-per-image
> hasn't been shown to be even theoretically desirable.
>
>> Most smalltalks and smalltalkers are deeply stuck in the past of one
>> native thread. Most in fact are not good at multi-threading with smalltalk
>> non-native threads!!! It's difficult to learn and get right which is one
>> motivator behind those wanting to take the easy road - one native thread per
>> image,
>
> Right, *one* motivator.
>
>> but that's the wrong route (in my view and obviously in others view as
>> well) because it isn't general purpose enough. It involves hard work. No way
>> around it.
>
> If you want to open up this discussion again, please bring some new facts.
>  Why would cache-thrashing not be an issue when running 64 cores on a single
> image?  I'm willing to be convinced, but I haven't seen even a sketch of a
> design that would avoid this.
>
>>
>>
>> Igor, how will we gain access to writing for chips like NVidia when they
>> keep it all secret?
>
> Keep what secret?  Both AMD and NVIDIA have exposed low-level instructions
> sets for their processors.  AMD's is called CTM, and I can't remember the
> name of NVIDIA's.  These instruction sets are at approximately the level of
> x86 assembly (i.e. low-level, but still portable across different GPU
> models).
>

From:
http://en.wikipedia.org/wiki/CUDA
----
Threads must run in groups of at least 32 threads that execute
identical instructions simultaneously. Branches in the program code do
not impact performance significantly, provided that each of 32 threads
takes the same execution path; the SIMD execution model becomes a
significant limitation for any inherently divergent task (e.g.,
traversing a ray tracing acceleration data structure).
----

Despite that we can program GPU, we can't make it to run different code :(
Also, its something utterly wrong with this statement.
Since its waste to run 32 threads on same set of input data, it
obvious that input is different. But since input data is different,
how it possible that all branches taking same path for each thread?

-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list