Hi!
I would like to discuss an implementation of thread safe collections (and streams, and others).
I can think on three possible approaches, the first one is to convert every class to be thread safe, the other is to implement new thread safe classes. I think that the first approach is interesting because the user wouldn't have to care about any thing to be thread safe. We would just use the class that is guaranteed to be thread safe by the system. But there would be an overhead imposed by the synchronization mechanism needed by a thread safe implementation. The second approach would let you choose between fast thread unsafe collections, and a slower thread safe collections. But you would have to choose it yourself. It would not be automatic. This could be done by subclassing every collection class like this:
Object +Collection +ThreadSafeCollection +Bag |+ThreadSafeBag +Set +ThreadSafeSet
Or doing an entirely new hierarchy like this (with Traits to implement similar behavior):
Object +Collection |+Bag |+Set +ThreadSafeCollection +ThreadSafeBag +ThreadSafeSet
What do you think?
Cheers Guille
Interesting idea.
I tend to think that more tansparent is better, but, if the overhead weren't a problem or finding a pattern to apply (Could be a decorator?).
About your options seems that the second may be more clean, as you comment with Traits.
Well, are only fast thoughts....
2008/4/24 Guillermo Adrián Molina guille@losmolina.com.ar:
Hi!
I would like to discuss an implementation of thread safe collections (and streams, and others).
I can think on three possible approaches, the first one is to convert every class to be thread safe, the other is to implement new thread safe classes. I think that the first approach is interesting because the user wouldn't have to care about any thing to be thread safe. We would just use the class that is guaranteed to be thread safe by the system. But there would be an overhead imposed by the synchronization mechanism needed by a thread safe implementation. The second approach would let you choose between fast thread unsafe collections, and a slower thread safe collections. But you would have to choose it yourself. It would not be automatic. This could be done by subclassing every collection class like this:
Object +Collection +ThreadSafeCollection +Bag |+ThreadSafeBag +Set +ThreadSafeSet
Or doing an entirely new hierarchy like this (with Traits to implement similar behavior):
Object +Collection |+Bag |+Set +ThreadSafeCollection +ThreadSafeBag +ThreadSafeSet
What do you think?
Cheers Guille
What do you mean by thread safe collections? Should only one thread at a time be able to perform #do: on a collection? Obviously we want to make sure that only one #add: or #remove: can be done at a time, but you should probably prevent #add: and #remove: when some other thread is running #do:. So, this will be the readers/writers problem. You should make a ReaderWriterLock class with methods #read: aBlock and #write: aBlock that are similar to Semaphore's #critical: method. Then you can make a subclass of Set called ThreadSafeSet with an instance variable rwLock that is a ReaderWriterLock and methods like
add: anElement wrLock write: [^super add: anElement]
do: aBlock wrLock read: [^super do: aBlock]
It hardly seems worthwhile using traits in a case like this. All the code will be very simple. The hard part will be deciding what the granularity of locks should be, and how inheritance interacts with it. Traits will not make this easier. When it comes to concurrency, inheritance is the problem, not the solution.
To answer your original question, it is a very bad idea to make every collection thread safe. Making them thread safe will make them slow and complicated. In general, multithreading makes things mroe complicated, and if you aren't careful it can make things slower, too.
Keeps threads as isolated from each other as possible. Have them interact though a narrow interface. It is fine to custom-design these classes, or else use a few classes that are designed for communication between threads, such as SharedQueue or Promise. But don't try to make most classes threadsafe, since that contradicts the prime imperative of keeping most classes simple.
-Ralph Johnson
2008/4/24 Ralph Johnson johnson@cs.uiuc.edu:
What do you mean by thread safe collections? Should only one thread at a time be able to perform #do: on a collection? Obviously we want to make sure that only one #add: or #remove: can be done at a time, but you should probably prevent #add: and #remove: when some other thread is running #do:. So, this will be the readers/writers problem. You should make a ReaderWriterLock class with methods #read: aBlock and #write: aBlock that are similar to Semaphore's #critical: method. Then you can make a subclass of Set called ThreadSafeSet with an instance variable rwLock that is a ReaderWriterLock and methods like
add: anElement wrLock write: [^super add: anElement]
do: aBlock wrLock read: [^super do: aBlock]
It hardly seems worthwhile using traits in a case like this. All the code will be very simple. The hard part will be deciding what the granularity of locks should be, and how inheritance interacts with it. Traits will not make this easier. When it comes to concurrency, inheritance is the problem, not the solution.
To answer your original question, it is a very bad idea to make every collection thread safe. Making them thread safe will make them slow and complicated. In general, multithreading makes things mroe complicated, and if you aren't careful it can make things slower, too.
Keeps threads as isolated from each other as possible. Have them interact though a narrow interface. It is fine to custom-design these classes, or else use a few classes that are designed for communication between threads, such as SharedQueue or Promise. But don't try to make most classes threadsafe, since that contradicts the prime imperative of keeping most classes simple.
+1 here. If you need thread-safe collections it shows that you trying to solve problem in a wrong way. You can protect shared resources by locks without exposing a collection interface. You should look at operations with shared resources as operations which lying on higher abstraction level, while direct operations with collections is atomic. Compare the number of operations which you have to wrap with locks in higher level API with number of operations which you will require to wrap for collections.
Whenever you need to deal with shared resource, place a lock on it, but do it outside the collection class:
addItem: item self critical: [ items add: item ]
removeItem: item self critical: [ items remove: item ]
findItem: aString ^ self critical: [ items detect: [ item foo = aString] ifNone: ['NONE']].
itemsDo: aBlock self critical: [ items do: aBlock ]
items ^ self shouldNotImplement "do not expose shared resource"
-Ralph Johnson
squeak-dev@lists.squeakfoundation.org