[squeak-dev] SharedQueue does scale (was: Re: [Pharo-project] SharedQueue doesn't scale)

Mon Oct 18 04:00:33 UTC 2010

On Sun, 17 Oct 2010, Levente Uzonyi wrote:

snip

> SharedQueue's code for "growing" (#makeRoomAtEnd) is crap IMHO. Because of 
> that it takes O(n) time to add or remove and element to or from the queue.
> SharedQueue2 is a lot better approach, because it doesn't try to
> reimplement a dynamic array, but uses OrderedCollection instead.

I uploaded a new version of that method to the Trunk. I don't think 
it's really useful, because I'm pretty sure we will get rid of both 
SharedQueue and SharedQueue2 in the near future.
Anyway I did some benchmarks using Cog, and SharedQueue became 
surprisingly good, though it's still not even close to Igor's new 
FIFOQueue or AtomicSharedQueue.

Benchmark #1: Make a large queue

{ SharedQueue. SharedQueue2. FIFOQueue. AtomicSharedQueue } collect: [ :queueClass |
 	| queue |
 	queue := queueClass new.
 	queueClass -> (
 		(1 to: 5) collect: [ :run |
 			Smalltalk garbageCollect.
 			{
 		               [ 1 to: 1000000 do: [ :each | queue nextPut: each ] ] timeToRun.
 		               [ 1 to: 1000000 do: [ :each | queue next ] ] timeToRun } ]) ].

SharedQueue->#(#(1028 1615) #(945 1557) #(976 2322) #(492 459) #(489 476)).
SharedQueue2->#(#(1976 2284) #(1318 8298) #(982 692) #(1005 675) #(1002 665)).
FIFOQueue->#(#(180 67) #(184 67) #(182 69) #(181 66) #(177 67)).
AtomicSharedQueue->#(#(208 66) #(207 67) #(209 66) #(213 68) #(209 65)).

This benchmark is similar to what Igor used, but it doesn't create a 
new queue between runs. It simply adds 1,000,000 elements then removes 
them which is a pretty unrealistic scenario for a shared queue. The effect 
of GC is pretty high on this benchmark, probably that's responsible for 
the high spikes.

Benchmark #2: Sequential throughput test

{ SharedQueue. SharedQueue2. FIFOQueue. AtomicSharedQueue } collect: [ :queueClass |
 	| queue |
 	queue := queueClass new.
 	queueClass -> (
 		(1 to: 5) collect: [ :run |
 			| results |
 			Smalltalk garbageCollect.
 			results := #(0 0).
 			1 to: 1000 do: [ :round |
 				results := results + {
 					[ 1 to: 1000 do: [ :each | queue nextPut: each ] ] timeToRun.
 					[ 1 to: 1000 do: [ :each | queue next ] ] timeToRun } ].
 			results ]) ].

SharedQueue->#(#(464 452) #(472 436) #(466 437) #(463 449) #(462 452)).
SharedQueue2->#(#(949 692) #(980 663) #(984 670) #(992 670) #(958 677)).
FIFOQueue->#(#(125 67) #(263 62) #(250 76) #(262 63) #(247 81)).
AtomicSharedQueue->#(#(154 70) #(264 77) #(273 62) #(275 63) #(265 71)).

This is similar to benchmark #1, but instead of adding and removing 
1,000,000 at once it's chunked up to 1,000 equal parts. It's more 
realistic than benchmark #1. It's interesting that both FIFOQueue and 
AtomicSharedQueue performed better in the previous benchmark, unlike the 
other two queues, which are better here.

Benchmark #3: Concurrent throughput test

{ SharedQueue. SharedQueue2. FIFOQueue. AtomicSharedQueue } collect: [ :queueClass |
 	| queue semaphore |
 	queue := queueClass new.
 	semaphore := Semaphore new.
 	queueClass -> (
 		(1 to: 5) collect: [ :run |
 			| producers consumers |
 			Smalltalk garbageCollect.
 			producers := (1 to: 100) collect: [ :each |
 				[ 1 to: 10000 do: [ :index | queue nextPut: each ] ] newProcess ].
 			consumers := (1 to: 100) collect: [ :each |
 				[
 					1 to: 10000 do: [ :index | queue next ].
 					semaphore signal ] newProcess ].
 			[
 				consumers do: [ :each | each priority: 39; resume ].
 				producers do: [ :each | each priority: 39; resume ].
 				100 timesRepeat: [ semaphore wait ] ] timeToRun ]) ].

SharedQueue->#(3143 2977 3034 2949 3021).
SharedQueue2->#(4280 4384 4179 4160 4181).
FIFOQueue->#(245 311 252 254 255).
AtomicSharedQueue->#(277 274 277 280 274)

This benchmark is the real concurrent stress test. 100 processes are 
adding 10,000 elements to the queue while another 100 are reading from it. 
It clearly shows that Igor's queues are an order of magnitude faster. 
Also 200 concurrent processes cause much less slowdown compared to the 
sequential tests for them.

So, even though SharedQueue is now faster than SharedQueue2, both will 
have to go IMHO. :)

Levente

snip