[Pkg] The Trunk: Kernel-nice.1224.mcz

commits at source.squeak.org commits at source.squeak.org
Fri May 3 20:43:43 UTC 2019


Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
http://source.squeak.org/trunk/Kernel-nice.1224.mcz

==================== Summary ====================

Name: Kernel-nice.1224
Author: nice
Time: 3 May 2019, 10:43:02.707788 pm
UUID: cf1a5ae4-ac18-47aa-880c-db91874842cb
Ancestors: Kernel-fn.1223

Refactor the fast LargeInteger operations

Rename:
- fastMultiply: -> multiplyByInteger:
Rationale: it's not necessarily fast but it's specifically for the case when we know that argument is an Integer.

Ensure symetry between * and /
- digitDiv:neg: -> divideByInteger:
- primDigitDiv:neg -> digitDiv:neg: (revert previous rename)

Rename
- karatsubaTimes: tooms3Times: -> digitMul22: digitMul33:
Rationale: it ensures symetry with digitDiv21: and digitDiv32:

Rename similarly
- squaredKaratsuba squaredToom3 squaredToom4 -> squaredByHalf squaredByThird squaredByFourth

Introduce digitMul23: for marginal improvment in case of partially well balanced

Remove the send of Inteval>>digitShiftSum:
Replace with #digitMulSplit: which is an optimized version in O(N) rather than O(N*log2 N)- see below

Rename
- recursiveDigitDiv: -> digitDivSplit: since it plays somehow a similar role than that of digitMulSplit: (decompose in digitDiv21: operations)
Use a O(N) recomposition of quotient

Reduce the cost of divide and conquer reconstruction phase by using (with great care) inplaceAddNonOverlapping:digitShiftBy: when possible.

Rationale: the cost is non completely dominated by non linear O(N^x), partly due to repeated #+ and #digitShift: which allocate, copy, allocate, copy, allocate, copy like Shlemiel.

We now have this sequence:

1) * / call primitive of SmallInteger or (optional) 64 bits LargeInteger 
2) if fail, super * will handle coercion or call multiplyByInteger: divideByInteger:
3) multiplyByInteger: and divideByInteger: will dispatch to appropriate method based on heuristics on operands length
3.a) schoolbook digitMultiply:neg: digitDiv:neg: which call (optional) LargeInteger primitives or revert to slow 1-byte-limb fallback code
3.b) digitMultiplySplit: if not well balanced operands / recursiveDigitDiv: for about the same purpose
3.c) digitMul22: digitMul23: digitMul33: digitDiv21: digitDiv32:

Accelerate a bit the nthRoot: by providing a better (divide and conquer) initial guess a bit like sqrtRem algorithm.
While at it, use raisedToInteger: instead of raisedTo:
Since this small refactoring makes two tests fail (bad tests IMO), provide an assertion of pre-condition (the argument is a strictly positive integer)

=============== Diff against Kernel-fn.1223 ===============

Item was changed:
  ----- Method: Integer>>* (in category 'arithmetic') -----
  * aNumber
  	"Refer to the comment in Number * " 
  	aNumber isInteger ifTrue:
+ 		[^ self multiplyByInteger: aNumber].
- 		[^ self fastMultiply: aNumber].
  	^ aNumber adaptToInteger: self andSend: #*!

Item was changed:
  ----- Method: Integer>>/ (in category 'arithmetic') -----
  / aNumber
  	"Refer to the comment in Number / "
  	| quoRem |
  	aNumber isInteger ifTrue:
+ 		[quoRem := self divideByInteger: aNumber.
- 		[quoRem := self digitDiv: aNumber neg: self negative ~~ aNumber negative.
  		(quoRem at: 2) = 0
  			ifTrue: [^ (quoRem at: 1) normalize]
  			ifFalse: [^ (Fraction numerator: self denominator: aNumber) reduced]].
  	^ aNumber adaptToInteger: self andSend: #/!

Item was changed:
  ----- Method: Integer>>digitDiv:neg: (in category 'private') -----
+ digitDiv: arg neg: ng 
+ 	"Answer with an array of (quotient, remainder)."
+ 	| quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
+ 	<primitive: 'primDigitDivNegative' module:'LargeIntegers'>
+ 	arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
+ 	"TFEI added this line"
+ 	l := self digitLength - arg digitLength + 1.
+ 	l <= 0 ifTrue: [^ Array with: 0 with: self].
+ 	"shortcut against #highBit"
+ 	d := 8 - arg lastDigit highBitOfByte.
+ 	div := arg digitLshift: d.
+ 	divDigitLength := div digitLength + 1.
+ 	div := div growto: divDigitLength.
+ 	"shifts so high order word is >=128"
+ 	rem := self digitLshift: d.
+ 	rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
+ 	remDigitLength := rem digitLength.
+ 	"makes a copy and shifts"
+ 	quo := Integer new: l neg: ng.
+ 	dl := divDigitLength - 1.
+ 	"Last actual byte of data"
+ 	ql := l.
+ 	dh := div digitAt: dl.
+ 	dnh := dl = 1
+ 				ifTrue: [0]
+ 				ifFalse: [div digitAt: dl - 1].
+ 	1 to: ql do: 
+ 		[:k | 
+ 		"maintain quo*arg+rem=self"
+ 		"Estimate rem/div by dividing the leading to bytes of rem by dh."
+ 		"The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
+ 		j := remDigitLength + 1 - k.
+ 		"r1 := rem digitAt: j."
+ 		(rem digitAt: j)
+ 			= dh
+ 			ifTrue: [qhi := qlo := 15
+ 				"i.e. q=255"]
+ 			ifFalse: 
+ 				["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.  
+ 				Note that r1,r2 are bytes, not nibbles.  
+ 				Be careful not to generate intermediate results exceeding 13  
+ 				bits."
+ 				"r2 := (rem digitAt: j - 1)."
+ 				t := ((rem digitAt: j)
+ 							bitShift: 4)
+ 							+ ((rem digitAt: j - 1)
+ 									bitShift: -4).
+ 				qhi := t // dh.
+ 				t := (t \\ dh bitShift: 4)
+ 							+ ((rem digitAt: j - 1)
+ 									bitAnd: 15).
+ 				qlo := t // dh.
+ 				t := t \\ dh.
+ 				"Next compute (hi,lo) := q*dnh"
+ 				hi := qhi * dnh.
+ 				lo := qlo * dnh + ((hi bitAnd: 15)
+ 								bitShift: 4).
+ 				hi := (hi bitShift: -4)
+ 							+ (lo bitShift: -8).
+ 				lo := lo bitAnd: 255.
+ 				"Correct overestimate of q.  
+ 				Max of 2 iterations through loop -- see Knuth vol. 2"
+ 				r3 := j < 3
+ 							ifTrue: [0]
+ 							ifFalse: [rem digitAt: j - 2].
+ 				[(t < hi
+ 					or: [t = hi and: [r3 < lo]])
+ 					and: 
+ 						["i.e. (t,r3) < (hi,lo)"
+ 						qlo := qlo - 1.
+ 						lo := lo - dnh.
+ 						lo < 0
+ 							ifTrue: 
+ 								[hi := hi - 1.
+ 								lo := lo + 256].
+ 						hi >= dh]]
+ 					whileTrue: [hi := hi - dh].
+ 				qlo < 0
+ 					ifTrue: 
+ 						[qhi := qhi - 1.
+ 						qlo := qlo + 16]].
+ 		"Subtract q*div from rem"
+ 		l := j - dl.
+ 		a := 0.
+ 		1 to: divDigitLength do: 
+ 			[:i | 
+ 			hi := (div digitAt: i)
+ 						* qhi.
+ 			lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
+ 							bitShift: 4) - ((div digitAt: i)
+ 							* qlo).
+ 			rem digitAt: l put: lo - (lo // 256 * 256).
+ 			"sign-tolerant form of (lo bitAnd: 255)"
+ 			a := lo // 256 - (hi bitShift: -4).
+ 			l := l + 1].
+ 		a < 0
+ 			ifTrue: 
+ 				["Add div back into rem, decrease q by 1"
+ 				qlo := qlo - 1.
+ 				l := j - dl.
+ 				a := 0.
+ 				1 to: divDigitLength do: 
+ 					[:i | 
+ 					a := (a bitShift: -8)
+ 								+ (rem digitAt: l) + (div digitAt: i).
+ 					rem digitAt: l put: (a bitAnd: 255).
+ 					l := l + 1]].
+ 		quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
+ 				+ qlo].
+ 	rem := rem
+ 				digitRshift: d
+ 				bytes: 0
+ 				lookfirst: dl.
+ 	^ Array with: quo with: rem!
- digitDiv: anInteger neg: aBoolean 
- 	^self primDigitDiv: anInteger neg: aBoolean!

Item was added:
+ ----- Method: Integer>>divideByInteger: (in category 'private') -----
+ divideByInteger: anInteger
+ 	"Answer an Array with quotient and remainder of division by anInteger.
+ 	the quotient is always truncated toward zero, and remainder has same sign as receiver.
+ 	Beware, the result might be un-normailzed (LargeInteger with leading zero digits)"
+ 	^self digitDiv: anInteger 
+ 			neg: self negative ~~ anInteger negative!

Item was removed:
- ----- Method: Integer>>fastMultiply: (in category 'private') -----
- fastMultiply: anInteger
- 	^self digitMultiply: anInteger 
- 			neg: self negative ~~ anInteger negative!

Item was added:
+ ----- Method: Integer>>initialGuessForNthRoot: (in category 'private') -----
+ initialGuessForNthRoot: aPositiveInteger
+ 	"Use a simplistic scheme based solely on most significant bit"
+ 	^1 bitShift: self highBitOfMagnitude + aPositiveInteger - 1 // aPositiveInteger!

Item was removed:
- ----- Method: Integer>>karatsubaTimes: (in category 'private') -----
- karatsubaTimes: anInteger
- 	"Eventually use Karatsuba algorithm to perform the multiplication.
- 	This is efficient only for large integers (see subclass).
- 	By default, use regular multiplication."
- 	
- 	^ self digitMultiply: anInteger 
- 			neg: self negative ~~ anInteger negative!

Item was added:
+ ----- Method: Integer>>multiplyByInteger: (in category 'private') -----
+ multiplyByInteger: anInteger
+ 	^self digitMultiply: anInteger 
+ 			neg: self negative ~~ anInteger negative!

Item was changed:
  ----- Method: Integer>>nthRootTruncated: (in category 'mathematical functions') -----
+ nthRootTruncated: n
- nthRootTruncated: aPositiveInteger
  	"Answer the integer part of the nth root of the receiver."
  	| guess guessToTheNthMinusOne nextGuess |
  	self = 0 ifTrue: [^0].
+ 	(n isInteger and: [n > 0]) ifFalse: [ ArithmeticError signal: 'nthRootTruncated: operand must be a strictly positive Integer' ].
  	self negative
  		ifTrue:
+ 			[n even ifTrue: [ ArithmeticError signal: 'Negative numbers don''t have even roots.' ].
+ 			^(self negated nthRootTruncated: n) negated].
+ 	guess := self initialGuessForNthRoot: n.
- 			[aPositiveInteger even ifTrue: [ ArithmeticError signal: 'Negative numbers don''t have even roots.' ].
- 			^(self negated nthRootTruncated: aPositiveInteger) negated].
- 	guess := 1 bitShift: self highBitOfMagnitude + aPositiveInteger - 1 // aPositiveInteger.
  	[
+ 		guessToTheNthMinusOne := guess raisedToInteger: n - 1.
+ 		nextGuess := (n - 1 * guess * guessToTheNthMinusOne + self) // (guessToTheNthMinusOne * n).
- 		guessToTheNthMinusOne := guess raisedTo: aPositiveInteger - 1.
- 		nextGuess := (aPositiveInteger - 1 * guess * guessToTheNthMinusOne + self) // (guessToTheNthMinusOne * aPositiveInteger).
  		nextGuess >= guess ] whileFalse:
  			[ guess := nextGuess ].
+ 	( guess raisedToInteger: n) > self  ifTrue:
- 	( guess raisedTo: aPositiveInteger) > self  ifTrue:
  			[ guess := guess - 1 ].
  	^guess!

Item was removed:
- ----- Method: Integer>>primDigitDiv:neg: (in category 'private') -----
- primDigitDiv: arg neg: ng 
- 	"Answer with an array of (quotient, remainder)."
- 	| quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
- 	<primitive: 'primDigitDivNegative' module:'LargeIntegers'>
- 	arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
- 	"TFEI added this line"
- 	l := self digitLength - arg digitLength + 1.
- 	l <= 0 ifTrue: [^ Array with: 0 with: self].
- 	"shortcut against #highBit"
- 	d := 8 - arg lastDigit highBitOfByte.
- 	div := arg digitLshift: d.
- 	divDigitLength := div digitLength + 1.
- 	div := div growto: divDigitLength.
- 	"shifts so high order word is >=128"
- 	rem := self digitLshift: d.
- 	rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
- 	remDigitLength := rem digitLength.
- 	"makes a copy and shifts"
- 	quo := Integer new: l neg: ng.
- 	dl := divDigitLength - 1.
- 	"Last actual byte of data"
- 	ql := l.
- 	dh := div digitAt: dl.
- 	dnh := dl = 1
- 				ifTrue: [0]
- 				ifFalse: [div digitAt: dl - 1].
- 	1 to: ql do: 
- 		[:k | 
- 		"maintain quo*arg+rem=self"
- 		"Estimate rem/div by dividing the leading to bytes of rem by dh."
- 		"The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
- 		j := remDigitLength + 1 - k.
- 		"r1 := rem digitAt: j."
- 		(rem digitAt: j)
- 			= dh
- 			ifTrue: [qhi := qlo := 15
- 				"i.e. q=255"]
- 			ifFalse: 
- 				["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.  
- 				Note that r1,r2 are bytes, not nibbles.  
- 				Be careful not to generate intermediate results exceeding 13  
- 				bits."
- 				"r2 := (rem digitAt: j - 1)."
- 				t := ((rem digitAt: j)
- 							bitShift: 4)
- 							+ ((rem digitAt: j - 1)
- 									bitShift: -4).
- 				qhi := t // dh.
- 				t := (t \\ dh bitShift: 4)
- 							+ ((rem digitAt: j - 1)
- 									bitAnd: 15).
- 				qlo := t // dh.
- 				t := t \\ dh.
- 				"Next compute (hi,lo) := q*dnh"
- 				hi := qhi * dnh.
- 				lo := qlo * dnh + ((hi bitAnd: 15)
- 								bitShift: 4).
- 				hi := (hi bitShift: -4)
- 							+ (lo bitShift: -8).
- 				lo := lo bitAnd: 255.
- 				"Correct overestimate of q.  
- 				Max of 2 iterations through loop -- see Knuth vol. 2"
- 				r3 := j < 3
- 							ifTrue: [0]
- 							ifFalse: [rem digitAt: j - 2].
- 				[(t < hi
- 					or: [t = hi and: [r3 < lo]])
- 					and: 
- 						["i.e. (t,r3) < (hi,lo)"
- 						qlo := qlo - 1.
- 						lo := lo - dnh.
- 						lo < 0
- 							ifTrue: 
- 								[hi := hi - 1.
- 								lo := lo + 256].
- 						hi >= dh]]
- 					whileTrue: [hi := hi - dh].
- 				qlo < 0
- 					ifTrue: 
- 						[qhi := qhi - 1.
- 						qlo := qlo + 16]].
- 		"Subtract q*div from rem"
- 		l := j - dl.
- 		a := 0.
- 		1 to: divDigitLength do: 
- 			[:i | 
- 			hi := (div digitAt: i)
- 						* qhi.
- 			lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
- 							bitShift: 4) - ((div digitAt: i)
- 							* qlo).
- 			rem digitAt: l put: lo - (lo // 256 * 256).
- 			"sign-tolerant form of (lo bitAnd: 255)"
- 			a := lo // 256 - (hi bitShift: -4).
- 			l := l + 1].
- 		a < 0
- 			ifTrue: 
- 				["Add div back into rem, decrease q by 1"
- 				qlo := qlo - 1.
- 				l := j - dl.
- 				a := 0.
- 				1 to: divDigitLength do: 
- 					[:i | 
- 					a := (a bitShift: -8)
- 								+ (rem digitAt: l) + (div digitAt: i).
- 					rem digitAt: l put: (a bitAnd: 255).
- 					l := l + 1]].
- 		quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
- 				+ qlo].
- 	rem := rem
- 				digitRshift: d
- 				bytes: 0
- 				lookfirst: dl.
- 	^ Array with: quo with: rem!

Item was changed:
  ----- Method: Integer>>quo: (in category 'arithmetic') -----
  quo: aNumber 
  	"Refer to the comment in Number quo: "
+ 	| quo |
- 	| ng quo |
  	aNumber isInteger ifTrue: 
+ 		[quo := (self divideByInteger: aNumber) at: 1.
- 		[ng := self negative == aNumber negative == false.
- 		quo := (self digitDiv: aNumber neg: ng) at: 1.
  		^ quo normalize].
  	^ aNumber adaptToInteger: self andSend: #quo:!

Item was added:
+ ----- Method: Integer>>raisedToInteger: (in category 'mathematical functions') -----
+ raisedToInteger: n
+ 	(n < 3 or: [self highBitOfMagnitude * n < 2048]) ifTrue: [^super raisedToInteger: n].
+ 	^n ternaryBinaryExponentationOf: self!

Item was changed:
+ ----- Method: Integer>>sqrtFloorNewtonRaphson (in category 'private') -----
- ----- Method: Integer>>sqrtFloorNewtonRaphson (in category 'mathematical functions') -----
  sqrtFloorNewtonRaphson
  	"Return the integer part of the square root of self.
  	Use a Newton-Raphson iteration since it converges quadratically
  	(twice more bits of precision at each iteration)"
  
  	| guess delta |
  	guess := 1 bitShift: self highBit + 1 // 2.
  	[
  		delta := guess squared - self // (guess bitShift: 1).
  		delta = 0 ] whileFalse: [
  			guess := guess - delta ].
  	^guess - 1!

Item was removed:
- ----- Method: Integer>>toom3Times: (in category 'private') -----
- toom3Times: anInteger
- 	^ self digitMultiply: anInteger 
- 			neg: self negative ~~ anInteger negative!

Item was changed:
  ----- Method: LargePositiveInteger>>\\ (in category 'arithmetic') -----
  \\ aNumber 
  	"Primitive. Take the receiver modulo the argument. The result is the
  	 remainder rounded towards negative infinity, of the receiver divided
  	 by the argument. Fail if the argument is 0. Fail if either the argument
  	 or the result is not a SmallInteger or a LargePositiveInteger in the 64 bit range.
  	 Optional. See Object documentation whatIsAPrimitive."
  
  	<primitive: 31>
  	aNumber isInteger
  		ifTrue:
+ 			[| qr q r |
+ 			qr := self divideByInteger: aNumber.
- 			[| neg qr q r |
- 			neg := self negative == aNumber negative == false.
- 			qr := self digitDiv: aNumber neg: neg.
  			q := qr first normalize.
  			r := qr last normalize.
  			^(q negative
  				ifTrue: [r isZero not]
+ 				ifFalse: [q isZero and: [self negative ~~ aNumber negative]])
- 				ifFalse: [q isZero and: [neg]])
  					ifTrue: [r + aNumber]
  					ifFalse: [r]].
  	^super \\ aNumber
  	!

Item was changed:
  ----- Method: LargePositiveInteger>>butLowestNDigits: (in category 'private') -----
+ butLowestNDigits: n
- butLowestNDigits: N
  	"make a new integer removing N least significant digits of self."
  	
+ 	^self bitShiftMagnitude: -8 * n!
- 	^self bitShiftMagnitude: -8 * N!

Item was changed:
  ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private') -----
  digitDiv21: anInteger
  	"This is part of the recursive division algorithm from Burnikel - Ziegler
  	Divide a two limbs receiver by 1 limb dividend
  	Each limb is decomposed in two halves of p bytes (8*p bits)
  	so as to continue the recursion"
  	
  	| p qr1 qr2 |
+ 	p := (anInteger digitLength + 1 bitShift: -1) bitClear: 2r11.
+ 	p <= 256 ifTrue: [^(self digitDiv: anInteger neg: false) collect: #normalize].
- 	p := anInteger digitLength + 1 bitShift: -1.
- 	p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false) collect: #normalize].
  	qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
  	qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p) digitDiv32: anInteger.
  	qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
  	^qr2!

Item was changed:
  ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') -----
  digitDiv32: anInteger
  	"This is part of the recursive division algorithm from Burnikel - Ziegler
  	Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
  	Each limb is made of p bytes (8*p bits).
  	This step transforms the division problem into multiplication
+ 	It must use a fast multiplyByInteger: to be worth the overhead costs."
- 	It must use the fastMultiply: to be worth the overhead costs."
  	
  	| a2 b1 d p q qr r |
+ 	p :=(anInteger digitLength + 1 bitShift: -1) bitClear: 2r11.
- 	p := anInteger digitLength + 1 bitShift: -1.
  	(a2 := self butLowestNDigits: 2*p) 
  		< (b1 := anInteger butLowestNDigits: p)
  		ifTrue:
  			[qr := (self butLowestNDigits: p) digitDiv21: b1.
  			q := qr first.
  			r := qr last]
  		ifFalse:
  			[q := (1 bitShift: 8*p) - 1.
  			r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1].
  	d := q * (anInteger lowestNDigits: p).
  	r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
  	[r < 0]
  		whileTrue:
  			[q := q - 1.
  			r := r + anInteger].
  	^Array with: q with: r
  	!

Item was removed:
- ----- Method: LargePositiveInteger>>digitDiv:neg: (in category 'private') -----
- digitDiv: anInteger neg: aBoolean 
- 	"If length is worth, engage a recursive divide and conquer strategy"
- 	| qr |
- 	(anInteger digitLength <= 256
- 			or: [self digitLength <= anInteger digitLength])
- 		ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean].
- 	qr := self abs recursiveDigitDiv: anInteger abs.
- 	self negative ifTrue: [qr at: 2 put: qr second negated].
- 	aBoolean ifTrue: [qr at: 1 put: qr first negated].
- 	^qr!

Item was added:
+ ----- Method: LargePositiveInteger>>digitDivSplit: (in category 'private') -----
+ digitDivSplit: anInteger
+ 	"This is the recursive division algorithm from Burnikel - Ziegler
+ 	See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
+ 	Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
+ 	https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
+ 	
+ 	| s m t a b z qr qDigits q i high low |
+ 	"round digits up to next power of 2"
+ 	s := anInteger digitLength.
+ 	m := 1 bitShift: (s - 1) highBit.
+ 	"shift so that leading bit of leading byte be 1, and digitLength power of two"
+ 	s := m * 8 - anInteger highBit.
+ 	a := self bitShift: s.
+ 	b := anInteger bitShift: s.
+ 	
+ 	"Decompose a into t limbs - each limb have m bytes
+ 	choose t such that leading bit of leading limb of a be 0"
+ 	t := (a highBit + 1 / (m * 8)) ceiling.
+ 	z := a butLowestNDigits: t - 2 * m.
+ 	i := t - 2.
+ 	qDigits := Array new: t - 1.
+ 	"and do a division of two limb by 1 limb b for each pair of limb of a"
+ 	[qr := z digitDiv21: b.
+ 	qDigits at: t - 1 - i put: qr first.
+ 	(i := i - 1) >= 0] whileTrue:
+ 		[z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m + 1 to: i + 1 * m)].
+ 	"recompose the quotient from digits"
+ 	t < 3
+ 		ifTrue:
+ 			[q := 0.
+ 			qDigits do: [:qi | q := (q bitShift: 8 * m) + qi]]
+ 		ifFalse:
+ 			["Use a O(n) reconstruction - note the interlace to avoid overlapping digits"
+ 			high  := LargePositiveInteger new: t - 2 * m + qDigits first digitLength.
+ 			low := LargePositiveInteger new: t - 2 * m.
+ 			1 to: t - 1 by: 2 do: [:k | high inplaceAddNonOverlapping: (qDigits at: k) digitShiftBy: t - 1 - k * m].
+ 			2 to: t - 1 by: 2 do: [:k | low inplaceAddNonOverlapping: (qDigits at: k) digitShiftBy: t - 1 - k * m].
+ 			q := high + low].
+ 	^Array with: q with: (qr last bitShift: s negated)!

Item was added:
+ ----- Method: LargePositiveInteger>>digitMul22: (in category 'private') -----
+ digitMul22: anInteger
+ 	"Multiply after decomposing each operand in two parts, using Karatsuba algorithm.
+ 	Karatsuba perform only 3 multiplications, leading to a cost O(n^3 log2)
+ 	asymptotically better than super O(n^2) for large number of digits n.
+ 	See https://en.wikipedia.org/wiki/Karatsuba_algorithm"
+ 	
+ 	| half xLow xHigh yLow yHigh low mid high |
+ 	"Divide each integer in two halves"
+ 	half := (anInteger digitLength + 1 bitShift: -1)  bitClear: 2r11.
+ 	xLow := self lowestNDigits: half.
+ 	xHigh := self butLowestNDigits: half.
+ 	yLow := anInteger lowestNDigits: half.
+ 	yHigh := anInteger butLowestNDigits: half.
+ 	
+ 	"Karatsuba trick: perform with 3 multiplications instead of 4"
+ 	low := xLow multiplyByInteger: yLow.
+ 	high := xHigh multiplyByInteger: yHigh.
+ 	mid := high + low + (xHigh - xLow multiplyByInteger: yLow - yHigh).
+ 	
+ 	"Sum the parts of decomposition"
+ 	^(high isZero
+ 		ifTrue: [low]
+ 		ifFalse: [(high bitShiftMagnitude: 16*half)
+ 			inplaceAddNonOverlapping: low digitShiftBy: 0])
+ 		+ (mid bitShiftMagnitude: 8*half)!

Item was added:
+ ----- Method: LargePositiveInteger>>digitMul23: (in category 'private') -----
+ digitMul23: anInteger
+ 	"Multiply after decomposing the receiver in 2 parts, and multiplicand in 3 parts.
+ 	Only invoke when anInteger digitLength between: 3/2 and 5/2 self digitLength.
+ 	This is a variant of Toom-Cook algorithm (see digitMul33:)"
+    
+ 	| half x1 x0 y2 y1 y0 y20 z3 z2 z1 z0 |
+ 	"divide self in 2 and operand in 3 parts"
+ 	half := ( self digitLength + 1 bitShift: -1) bitClear: 2r11.
+ 	x1 := self butLowestNDigits: half.
+ 	x0 := self lowestNDigits: half.
+ 	y2 := anInteger butLowestNDigits: half * 2.
+ 	y1 := anInteger copyDigitsFrom: half + 1 to: half * 2.
+ 	y0 := anInteger lowestNDigits: half.
+    
+ 	"Toom trick: 4 multiplications instead of 6"
+ 	y20 := y2 + y0.
+ 	z3 := x1 multiplyByInteger: y2.
+ 	z2 := x0 - x1 multiplyByInteger: y20 - y1.
+ 	z1 := x0 + x1 multiplyByInteger: y20 + y1.
+ 	z0 := x0 multiplyByInteger: y0.
+    
+ 	"Sum the parts of decomposition"
+ 	^z0 + ((z1 - z2 bitShift: -1) - z3 bitShift: 8*half)
+ 		+ (((z1 + z2 bitShift: -1) - z0) + (z3 bitShift: 8*half) bitShift: 16 * half)!

Item was added:
+ ----- Method: LargePositiveInteger>>digitMul33: (in category 'private') -----
+ digitMul33: anInteger
+ 	"Multiply after decomposing each operand in 3 parts, using a Toom-Cooke algorithm.
+ 	Toom-Cooke is a generalization of Karatsuba divide and conquer algorithm.
+ 	See https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication
+ 	Use a Bodrato-Zanoni variant for the choice of interpolation points and matrix inversion
+ 	See What about Toom-Cook matrices optimality? - Marco Bodrato, Alberto Zanoni - Oct. 2006
+ 	http://www.bodrato.it/papers/WhatAboutToomCookMatricesOptimality.pdf"
+ 	
+ 	| third x2 x1 x0 y2 y1 y0 y20 z4 z3 z2 z1 z0 x20 |
+ 	"divide both operands in 3 parts"
+ 	third := anInteger digitLength + 2 // 3 bitClear: 2r11.
+ 	x2 := self butLowestNDigits: third * 2.
+ 	x1 := self copyDigitsFrom: third + 1 to: third * 2.
+ 	x0 := self lowestNDigits: third.
+ 	y2 := anInteger butLowestNDigits: third * 2.
+ 	y1 := anInteger copyDigitsFrom: third + 1 to: third * 2.
+ 	y0 := anInteger lowestNDigits: third.
+ 	
+ 	"Toom-3 trick: 5 multiplications instead of 9"
+ 	z0 := x0 multiplyByInteger: y0.
+ 	z4 := x2 multiplyByInteger: y2.
+ 	x20 := x2 + x0.
+ 	y20 := y2 + y0.
+ 	z1 := x20 + x1 multiplyByInteger: y20 + y1.
+ 	x20 := x20 - x1.
+ 	y20 := y20 - y1.
+ 	z2 := x20 multiplyByInteger: y20.
+ 	z3 := (x20 + x2 bitShift: 1) - x0 multiplyByInteger: (y20 + y2 bitShift: 1) - y0.
+ 	
+ 	"Sum the parts of decomposition"
+ 	z3 := z3 - z1 quo: 3.
+ 	z1 := z1 - z2 bitShift: -1.
+ 	z2 := z2 - z0.
+ 	
+ 	z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1).
+ 	z2 := z2 + z1 - z4.
+ 	z1 := z1 - z3.
+ 	^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third)!

Item was added:
+ ----- Method: LargePositiveInteger>>digitMulSplit: (in category 'private') -----
+ digitMulSplit: anInteger
+ 	"multiply digits when self and anInteger have not well balanced digitlength.
+ 	in this case, it is better to split the largest (anInteger) in several parts and recompose"
+ 
+ 	| xLen yLen split q r high mid low sizes |
+ 	yLen := anInteger digitLength.
+ 	xLen := self digitLength.
+ 	split := (xLen * 3 + 2 bitShift: -1) bitClear: 2r11.
+ 	
+ 	"Arrange to sum non overlapping parts"
+ 	q := yLen // split.
+ 	q < 3 ifTrue: [^(0 to: yLen - 1 by: split) detectSum: [:yShift | (self multiplyByInteger: (anInteger copyDigitsFrom: yShift + 1 to: yShift + split)) bitShift: 8 * yShift]].
+ 	r := yLen \\ split.
+ 	"allocate enough bytes, but not too much, in order to minimise normalize cost;
+ 	we could allocate xLen + yLen for each one as well"
+ 	sizes := {q-1*split. q*split. q*split+r}.
+ 	low  := Integer new: (sizes atWrap: 0 - (q\\3)) + xLen neg: self negative ~~ anInteger negative.
+ 	mid := Integer new:  (sizes atWrap: 1 - (q\\3)) + xLen neg: self negative ~~ anInteger negative.
+ 	high := Integer new: (sizes atWrap: 2 - (q\\3)) + xLen neg: self negative ~~ anInteger negative.
+ 	0 to: yLen - 1 by: 3 * split do: [:yShift |
+ 		low
+ 			inplaceAddNonOverlapping: (self multiplyByInteger: (anInteger copyDigitsFrom: yShift + 1 to: yShift + split))
+ 			digitShiftBy: yShift].
+ 	split to: yLen - 1 by: 3 * split do: [:yShift |
+ 		mid
+ 			inplaceAddNonOverlapping: (self multiplyByInteger: (anInteger copyDigitsFrom: yShift + 1 to: yShift + split))
+ 			digitShiftBy: yShift].
+ 	split * 2 to: yLen - 1 by: 3 * split do: [:yShift |
+ 		high
+ 			inplaceAddNonOverlapping: (self multiplyByInteger: (anInteger copyDigitsFrom: yShift + 1 to: yShift + split))
+ 			digitShiftBy: yShift].
+ 	^high normalize + mid normalize + low normalize!

Item was added:
+ ----- Method: LargePositiveInteger>>divideByInteger: (in category 'private') -----
+ divideByInteger: anInteger
+ 	"If length is worth, engage a recursive divide and conquer strategy,
+ 	more efficient than super"
+ 	| qr |
+ 	(anInteger digitLength <= 256
+ 			or: [self digitLength <= anInteger digitLength])
+ 		ifTrue: [^ self digitDiv: anInteger neg: self negative ~~ anInteger negative].
+ 	qr := self abs digitDivSplit: anInteger abs.
+ 	self negative ifTrue: [qr at: 2 put: qr second negated].
+ 	self negative = anInteger negative ifFalse: [qr at: 1 put: qr first negated].
+ 	^qr!

Item was removed:
- ----- Method: LargePositiveInteger>>fastMultiply: (in category 'private') -----
- fastMultiply: anInteger
- 	"Return the result of multiplying the receiver by the Integer argument.
- 	This method dispatch to the fastest algorithm based on operands length."
- 	
- 	| length |
- 	"Revert to schoolbook multiplication if short"
- 	(length := self digitLength) < 600
- 		ifTrue: [^ self digitMultiply: anInteger 
- 					neg: self negative ~~ anInteger negative].
- 	
- 	"Arrange to have the receiver be the shorter and retry"
- 	length > anInteger digitLength
- 		ifTrue: [^ anInteger fastMultiply: self ].
- 
- 	"Choose the fastest algorithm based on another heuristic"
- 	length < 6000 ifTrue: [^self karatsubaTimes: anInteger].
- 	^self toom3Times: anInteger!

Item was added:
+ ----- Method: LargePositiveInteger>>initialGuessForNthRoot: (in category 'private') -----
+ initialGuessForNthRoot: n
+ 	"Try a divide and conquer approach for the initial guess of nth-root of receiver"
+ 	
+ 	| p high mid q s sn1 |
+ 	"Split self in high*b^n+mid*b+low"
+ 	p := ((self highBit bitShift: -1) // n) bitShift: -3.
+ 	p >= 4 ifFalse: [^super initialGuessForNthRoot: n].
+ 	high := self butLowestNDigits: p * n.
+ 	mid := self copyDigitsFrom: p + 1 to: p * n.
+ 	
+ 	s := high nthRootTruncated: n.
+ 	sn1 := s raisedToInteger: n - 1.
+ 	q := (sn1 * s bitShift: n - 1 * 8 * p) + mid quo: (sn1 * n bitShift: n - 2 * 8 * p).
+ 	^(s bitShift: 8 * p) + q
+ 	!

Item was added:
+ ----- Method: LargePositiveInteger>>inplaceAddNonOverlapping:digitShiftBy: (in category 'private') -----
+ inplaceAddNonOverlapping: anInteger digitShiftBy: p
+ 	"Beware: this will destructively replace some digits with that of anInteger.
+ 	- if self and anInteger are of same sign
+ 		(self negative = anInteger negative).
+ 	- if self and anInteger have no overlapping digits
+ 		(self anyBitOfMagnitudeFrom: 8 * p + 1 to: anInteger digitLength + p * 8) not.
+ 	- if anInteger does not overflow self digitLength
+ 		anInteger digitLength + p <= self digitLength.
+ 	then, and only then, this method can be used instead of more costly
+ 		self + (anInteger bitShift: 8 * p).
+ 	This operation is useful in divide and conquer reconstruction phase"
+ 	
+ 	self replaceFrom: p + 1 to: p + anInteger digitLength with: anInteger startingAt: 1!

Item was removed:
- ----- Method: LargePositiveInteger>>karatsubaTimes: (in category 'private') -----
- karatsubaTimes: anInteger
- 	"Eventually use Karatsuba algorithm to perform the multiplication
- 	The Karatsuba algorithm divide number in two smaller parts.
- 	Then it uses tricks to perform only 3 multiplications.
- 	See https://en.wikipedia.org/wiki/Karatsuba_algorithm"
- 	
- 	| half xHigh xLow yHigh yLow low high mid xLen yLen |
- 	"Revert to schoolbook multiplication if short"
- 	(xLen := self digitLength) < 600
- 		ifTrue: [^ self digitMultiply: anInteger 
- 					neg: self negative ~~ anInteger negative].
- 	
- 	"Arrange to have the receiver be the shorter and retry"
- 	xLen > (yLen := anInteger digitLength)
- 		ifTrue: [^ anInteger karatsubaTimes: self ].
-    
- 	"Seek for integers of about the same length, else split the long one"
- 	yLen >= (7 * xLen bitShift: -2)
- 		ifTrue:
- 			[^(0 to: yLen - 1 by: xLen + 1)  digitShiftSum: [:yShift |
- 				self karatsubaTimes:
- 					(anInteger copyDigitsFrom: yShift + 1 to: yShift + 1 + xLen)]].
- 	
- 	"At this point, lengths are similar, divide each integer in two halves"
- 	half := (yLen + 1 bitShift: -1)  bitClear: 2r11.
- 	xLow := self lowestNDigits: half.
- 	xHigh := self butLowestNDigits: half.
- 	yLow := anInteger lowestNDigits: half.
- 	yHigh := anInteger butLowestNDigits: half.
- 	
- 	"Karatsuba trick: perform with 3 multiplications instead of 4"
- 	low := xLow karatsubaTimes: yLow.
- 	high := xHigh karatsubaTimes: yHigh.
- 	mid := high + low + (xHigh - xLow karatsubaTimes: yLow - yHigh).
- 	
- 	"Sum the parts of decomposition"
- 	^low + (mid bitShiftMagnitude: 8*half) + (high bitShiftMagnitude: 16*half)
- 	
- "
- | a b |
- a := 650 factorial-1.
- b := 700 factorial-1.
- {a digitLength. b digitLength}.
- self assert: (a karatsubaTimes: b) - (a * b) = 0.
- [Smalltalk garbageCollect.
- [1000 timesRepeat: [a karatsubaTimes: b]] timeToRun] value /
- [Smalltalk garbageCollect.
- [1000 timesRepeat: [a * b]] timeToRun] value asFloat
- "!

Item was changed:
  ----- Method: LargePositiveInteger>>lowestNDigits: (in category 'private') -----
+ lowestNDigits: n
+ 	"Make a new integer keeping only n least significant digits of self"
- lowestNDigits: N
- 	"Make a new integer keeping only N least significant digits of self"
  	
  	| low |
+ 	n >= self digitLength ifTrue: [^self].
+ 	low := self class new: n.
+ 	low replaceFrom: 1 to: n with: self startingAt: 1.
- 	N >= self digitLength ifTrue: [^self].
- 	low := self class new: N.
- 	low replaceFrom: 1 to: N with: self startingAt: 1.
  	^low normalize!

Item was added:
+ ----- Method: LargePositiveInteger>>multiplyByInteger: (in category 'private') -----
+ multiplyByInteger: anInteger
+ 	"Return the result of multiplying the receiver by the Integer argument.
+ 	This method dispatch to the fastest algorithm based on operands length."
+ 	
+ 	| xLen yLen |
+ 	"Revert to schoolbook multiplication if short"
+ 	(xLen := self digitLength) < 600
+ 		ifTrue: [^ self digitMultiply: anInteger 
+ 					neg: self negative ~~ anInteger negative].
+ 	
+ 	"Arrange to have the receiver be the shorter and retry"
+ 	xLen > (yLen := anInteger digitLength)
+ 		ifTrue: [^ anInteger multiplyByInteger: self ].
+ 
+ 	"Seek for integers of about the same length, else split the long one"
+ 	yLen > (2 * xLen) ifTrue: [^self digitMulSplit: anInteger].
+ 	
+ 	"Choose the fastest divide and conquer algorithm based on another heuristic"
+ 	xLen >= 2000 ifTrue: [^self digitMul33: anInteger].
+ 	yLen * 2 >= (3 * xLen) ifTrue: [^self digitMul23: anInteger].
+ 	^self digitMul22: anInteger!

Item was changed:
  ----- Method: LargePositiveInteger>>printOn:base: (in category 'printing') -----
  printOn: aStream base: b
  	"Append a representation of this number in base b on aStream.
  	In order to reduce cost of LargePositiveInteger ops, split the number in approximately two equal parts in number of digits."
  	
  	| halfDigits halfPower head tail nDigitsUnderestimate qr |
  	"Don't engage any arithmetic if not normalized"
  	(self digitLength = 0 or: [(self digitAt: self digitLength) = 0]) ifTrue: [^self normalize printOn: aStream base: b].
  	
  	nDigitsUnderestimate := b = 10
  		ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is because (2 log)/(10 log)*4096 is slightly greater than 1233"]
  		ifFalse: [self highBit quo: b highBit].
  		
  	"splitting digits with a whole power of two is more efficient"
  	halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2.
  	
  	halfDigits <= 1
  		ifTrue: ["Hmmm, this could happen only in case of a huge base b... Let lower level fail"
  			^self printOn: aStream base: b nDigits: (self numberOfDigitsInBase: b)].
  	
  	"Separate in two halves, head and tail"
  	halfPower := b raisedToInteger: halfDigits.
+ 	qr := self divideByInteger: halfPower.
- 	qr := self digitDiv: halfPower neg: self negative.
  	head := qr first normalize.
  	tail := qr last normalize.
  	
  	"print head"
  	head printOn: aStream base: b.
  	
  	"print tail without the overhead to count the digits"
  	tail printOn: aStream base: b nDigits: halfDigits!

Item was changed:
  ----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category 'printing') -----
  printOn: aStream base: b nDigits: n
  	"Append a representation of this number in base b on aStream using n digits.
  	In order to reduce cost of LargePositiveInteger ops, split the number of digts approximatily in two
  	Should be invoked with: 0 <= self < (b raisedToInteger: n)"
  	
  	| halfPower half head tail qr |
  	n <= 1 ifTrue: [
  		n <= 0 ifTrue: [self error: 'Number of digits n should be > 0'].
  		
  		"Note: this is to stop an infinite loop if one ever attempts to print with a huge base
  		This can happen because choice was to not hardcode any limit for base b
  		We let Character>>#digitValue: fail"
  		^aStream nextPut: (Character digitValue: self) ].
  	halfPower := n bitShift: -1.
  	half := b raisedToInteger: halfPower.
+ 	qr := self divideByInteger: half.
- 	qr := self digitDiv: half neg: self negative.
  	head := qr first normalize.
  	tail := qr last normalize.
  	head printOn: aStream base: b nDigits: n - halfPower.
  	tail printOn: aStream base: b nDigits: halfPower!

Item was removed:
- ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') -----
- recursiveDigitDiv: anInteger
- 	"This is the recursive division algorithm from Burnikel - Ziegler
- 	See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
- 	Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
- 	https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
- 	
- 	| s m t a b z qr q i |
- 	"round digits up to next power of 2"
- 	s := anInteger digitLength.
- 	m := 1 bitShift: (s - 1) highBit.
- 	"shift so that leading bit of leading byte be 1, and digitLength power of two"
- 	s := m * 8 - anInteger highBit.
- 	a := self bitShift: s.
- 	b := anInteger bitShift: s.
- 	
- 	"Decompose a into t limbs - each limb have m bytes
- 	choose t such that leading bit of leading limb of a be 0"
- 	t := (a highBit + 1 / (m * 8)) ceiling.
- 	z := a butLowestNDigits: t - 2 * m.
- 	i := t - 2.
- 	q := 0.
- 	"and do a division of two limb by 1 limb b for each pair of limb of a"
- 	[qr := z digitDiv21: b.
- 	q := (q bitShift: 8*m) + qr first.	"Note: this naive recomposition of q cost O(t^2) - it is possible in O(t log(t))"
- 	(i := i - 1) >= 0] whileTrue:
- 		[z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m + 1 to: i + 1 * m)].
- 	^Array with: q with: (qr last bitShift: s negated)!

Item was changed:
  ----- Method: LargePositiveInteger>>rem: (in category 'arithmetic') -----
  rem: aNumber 
  	"Remainder defined in terms of quo:. See super rem:. Fail if
  	 the argument is 0. Fail if either the argument or the result is not a
  	 SmallInteger or a LargePositiveInteger in the 64 bit range.
  	 Optional. See Object documentation whatIsAPrimitive."
  
  	<primitive: 20>
  	 aNumber isInteger
  		ifTrue:
+ 			[| rem |
+ 			rem := (self divideByInteger: aNumber) at: 2.
- 			[| ng rem |
- 			ng := self negative == aNumber negative == false.
- 			rem := (self digitDiv: aNumber neg: ng) at: 2.
  			^ rem normalize].
  	^super rem: aNumber!

Item was changed:
  ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') -----
  sqrtRem
  	"Like super, but use a divide and conquer method to perform this operation.
  	See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854>
  	https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
  	
  	| n qr q s r sr a3a2 a1 a0 |
  	"Split self in 4 digits a3,a2,a1,a0 in base b,
  	such that most significant digit a3 >= b/4
  	It is not a problem to have a3 >= b,
  	so we can round b down to a whole number of bytes n"
  	n := self highBit bitShift: -5. "bitShift: -2 divide in 4 parts, bitShift: -3 round down in bytes"
  	n >= 16 ifFalse: [^super sqrtRem].
  	a3a2 := self butLowestNDigits: n * 2.
  	a1 := self copyDigitsFrom: n + 1 to: n * 2.
  	a0 := self lowestNDigits: n.
  	
  	sr := a3a2 sqrtRem.
+ 	qr := (sr last bitShift: 8 * n) + a1 divideByInteger: (sr first bitShift: 1).
- 	qr := (sr last bitShift: 8 * n) + a1 digitDiv: (sr first bitShift: 1) neg: false.
  	q := qr first normalize.
  	s := (sr first bitShift: 8 * n) + q.
  	r := (qr last normalize bitShift: 8 * n) + a0 - q squared.
  	r negative
  		ifTrue:
  			[r := (s bitShift: 1) + r - 1.
  			s := s - 1].
  	sr at: 1 put: s; at: 2 put: r.
  	^sr
  	!

Item was changed:
  ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') -----
  squared
  	"Eventually use a divide and conquer algorithm to perform the multiplication"
  
  	(self digitLength >= 400) ifFalse: [^self * self].
+ 	(self digitLength >= 800) ifFalse: [^self squaredByHalf].
+ 	^self squaredByFourth!
- 	(self digitLength >= 800) ifFalse: [^self squaredKaratsuba].
- 	^self squaredToom4!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredByFourth (in category 'private') -----
+ squaredByFourth
+ 	"Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
+ 	See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
+ 	https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
+ 	
+ 	| p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
+ 	"divide in 4 parts"
+ 	p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
+ 	a3 := self butLowestNDigits: p * 3.
+ 	a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
+ 	a1 := self copyDigitsFrom: p + 1 to: p * 2.
+ 	a0 := self lowestNDigits: p.
+ 	
+ 	"Toom-4 trick: 7 multiplications instead of 16"
+ 	a02 := a0 - a2.
+ 	a13 := a1 - a3.
+ 	s0 := a0 squared.
+ 	s1 := (a0 * a1) bitShift: 1.
+ 	s2 := (a02 + a13) * (a02 - a13).
+ 	s3 := ((a0 + a1) + (a2 + a3)) squared.
+ 	s4 := (a02 * a13) bitShift: 1.
+ 	s5 := (a3 * a2) bitShift: 1.
+ 	s6 := a3 squared.
+ 	
+ 	"Interpolation"
+ 	t2 := s1 + s5.
+ 	t3 := (s2 + s3 + s4 bitShift: -1) - t2.
+ 	s3 := t2 - s4.
+ 	s4 := t3 - s0.
+ 	s2 := t3 - s2 - s6.
+ 	
+ 	"Sum the parts of decomposition"
+ 	^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
+ 	+(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
+ 	
+ "
+ | a |
+ a := 770 factorial-1.
+ a digitLength.
+ [a * a - a squaredToom4 = 0] assert.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
+ "!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredByHalf (in category 'private') -----
+ squaredByHalf
+ 	"Use a divide and conquer algorithm to perform the multiplication.
+ 	Split in two parts like Karatsuba, but economize 2 additions by using asymetrical product."
+ 	
+ 	| half xHigh xLow low high mid |
+ 	
+ 	"Divide digits in two halves"
+ 	half := self digitLength + 1 // 2 bitClear: 2r11.
+ 	xLow := self lowestNDigits: half.
+ 	xHigh := self butLowestNDigits: half.
+ 	
+ 	"eventually use karatsuba"
+ 	low := xLow squared.
+ 	high := xHigh squared.
+ 	mid := xLow multiplyByInteger: xHigh.
+ 	
+ 	"Sum the parts of decomposition"
+ 	^(high bitShiftMagnitude: 16*half)
+ 		inplaceAddNonOverlapping: low digitShiftBy: 0;
+ 		+ (mid bitShiftMagnitude: 8*half+1)
+ 	
+ "
+ | a |
+ a := 440 factorial-1.
+ a digitLength.
+ self assert: a * a - a squaredKaratsuba = 0.
+ [Smalltalk garbageCollect.
+ [2000 timesRepeat: [a squaredKaratsuba]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [2000 timesRepeat: [a * a]] timeToRun] value asFloat
+ "!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredByThird (in category 'private') -----
+ squaredByThird
+ 	"Use a 3-way Toom-Cook divide and conquer algorithm to perform the multiplication"
+ 	
+ 	| third x0 x1 x2 x20 z0 z1 z2 z3 z4 |
+ 	"divide in 3 parts"
+ 	third := self digitLength + 2 // 3 bitClear: 2r11.
+ 	x2 := self butLowestNDigits: third * 2.
+ 	x1 := self copyDigitsFrom: third + 1 to: third * 2.
+ 	x0 := self lowestNDigits: third.
+ 	
+ 	"Toom-3 trick: 5 multiplications instead of 9"
+ 	z0 := x0 squared.
+ 	z4 := x2 squared.
+ 	x20 := x2 + x0.
+ 	z1 := (x20 + x1) squared.
+ 	x20 := x20 - x1.
+ 	z2 := x20 squared.
+ 	z3 := ((x20 + x2 bitShift: 1) - x0) squared.
+ 	
+ 	"Sum the parts of decomposition"
+ 	z3 := z3 - z1 quo: 3.
+ 	z1 := z1 - z2 bitShift: -1.
+ 	z2 := z2 - z0.
+ 	
+ 	z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1).
+ 	z2 := z2 + z1 - z4.
+ 	z1 := z1 - z3.
+ 	^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third)
+ 	
+ "
+ | a |
+ a := 1400 factorial-1.
+ a digitLength.
+ self assert: a * a - a squaredToom3 = 0.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredToom3]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
+ "!

Item was removed:
- ----- Method: LargePositiveInteger>>squaredKaratsuba (in category 'mathematical functions') -----
- squaredKaratsuba
- 	"Use a divide and conquer algorithm to perform the multiplication.
- 	Split in two parts like Karatsuba, but economize 2 additions by using asymetrical product."
- 	
- 	| half xHigh xLow low high mid |
- 	
- 	"Divide digits in two halves"
- 	half := self digitLength + 1 // 2 bitClear: 2r11.
- 	xLow := self lowestNDigits: half.
- 	xHigh := self butLowestNDigits: half.
- 	
- 	"eventually use karatsuba"
- 	low := xLow squared.
- 	high := xHigh squared.
- 	mid := xLow karatsubaTimes: xHigh.
- 	
- 	"Sum the parts of decomposition"
- 	^low + (mid bitShift: 8*half+1) + (high bitShift: 16*half)
- 	
- "
- | a |
- a := 440 factorial-1.
- a digitLength.
- self assert: a * a - a squaredKaratsuba = 0.
- [Smalltalk garbageCollect.
- [2000 timesRepeat: [a squaredKaratsuba]] timeToRun] value /
- [Smalltalk garbageCollect.
- [2000 timesRepeat: [a * a]] timeToRun] value asFloat
- "!

Item was removed:
- ----- Method: LargePositiveInteger>>squaredToom3 (in category 'mathematical functions') -----
- squaredToom3
- 	"Use a 3-way Toom-Cook divide and conquer algorithm to perform the multiplication"
- 	
- 	| third x0 x1 x2 x20 z0 z1 z2 z3 z4 |
- 	"divide in 3 parts"
- 	third := self digitLength + 2 // 3 bitClear: 2r11.
- 	x2 := self butLowestNDigits: third * 2.
- 	x1 := self copyDigitsFrom: third + 1 to: third * 2.
- 	x0 := self lowestNDigits: third.
- 	
- 	"Toom-3 trick: 5 multiplications instead of 9"
- 	z0 := x0 squared.
- 	z4 := x2 squared.
- 	x20 := x2 + x0.
- 	z1 := (x20 + x1) squared.
- 	x20 := x20 - x1.
- 	z2 := x20 squared.
- 	z3 := ((x20 + x2 bitShift: 1) - x0) squared.
- 	
- 	"Sum the parts of decomposition"
- 	z3 := z3 - z1 quo: 3.
- 	z1 := z1 - z2 bitShift: -1.
- 	z2 := z2 - z0.
- 	
- 	z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1).
- 	z2 := z2 + z1 - z4.
- 	z1 := z1 - z3.
- 	^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third)
- 	
- "
- | a |
- a := 1400 factorial-1.
- a digitLength.
- self assert: a * a - a squaredToom3 = 0.
- [Smalltalk garbageCollect.
- [1000 timesRepeat: [a squaredToom3]] timeToRun] value /
- [Smalltalk garbageCollect.
- [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
- "!

Item was removed:
- ----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') -----
- squaredToom4
- 	"Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
- 	See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
- 	https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
- 	
- 	| p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
- 	"divide in 4 parts"
- 	p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
- 	a3 := self butLowestNDigits: p * 3.
- 	a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
- 	a1 := self copyDigitsFrom: p + 1 to: p * 2.
- 	a0 := self lowestNDigits: p.
- 	
- 	"Toom-4 trick: 7 multiplications instead of 16"
- 	a02 := a0 - a2.
- 	a13 := a1 - a3.
- 	s0 := a0 squared.
- 	s1 := (a0 * a1) bitShift: 1.
- 	s2 := (a02 + a13) * (a02 - a13).
- 	s3 := ((a0 + a1) + (a2 + a3)) squared.
- 	s4 := (a02 * a13) bitShift: 1.
- 	s5 := (a3 * a2) bitShift: 1.
- 	s6 := a3 squared.
- 	
- 	"Interpolation"
- 	t2 := s1 + s5.
- 	t3 := (s2 + s3 + s4 bitShift: -1) - t2.
- 	s3 := t2 - s4.
- 	s4 := t3 - s0.
- 	s2 := t3 - s2 - s6.
- 	
- 	"Sum the parts of decomposition"
- 	^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
- 	+(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
- 	
- "
- | a |
- a := 770 factorial-1.
- a digitLength.
- [a * a - a squaredToom4 = 0] assert.
- [Smalltalk garbageCollect.
- [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
- [Smalltalk garbageCollect.
- [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
- "!

Item was removed:
- ----- Method: LargePositiveInteger>>toom3Times: (in category 'private') -----
- toom3Times: anInteger
- 	"Eventually use a variant of Toom-Cooke for performing multiplication.
- 	Toom-Cooke is a generalization of Karatsuba divide and conquer algorithm.
- 	It divides operands in n parts instead of 2.
- 	See https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication
- 	Here we divide each operand in 3 parts, thus the name Toom-3.
- 	And use a Bodrato-Zanoni variant for the choice of interpolation points and matrix inversion
- 	See What about Toom-Cook matrices optimality? - Marco Bodrato, Alberto Zanoni - Oct. 2006
- 	http://www.bodrato.it/papers/WhatAboutToomCookMatricesOptimality.pdf"
- 	
- 	| third x2 x1 x0 y2 y1 y0 xLen yLen y20 z4 z3 z2 z1 z0 x20 |
- 	"Revert to schoolbook multiplication if short"
- 	(xLen := self digitLength) < 6000
- 		ifTrue: [^ self karatsubaTimes: anInteger ].
- 	
- 	"Arrange to have the receiver be the shorter and retry"
- 	xLen > (yLen := anInteger digitLength)
- 		ifTrue: [^ anInteger toom3Times: self ].
- 	
- 	"Seek for well balanced integers"
- 	yLen > (5 * xLen bitShift: -2)
- 		ifTrue: [^(0 to: yLen - 1 by: xLen + 1)  digitShiftSum: [:yShift |
- 				self toom3Times: (anInteger copyDigitsFrom: yShift + 1 to: yShift +1 + xLen)]].
- 	
- 	"At this point, lengths are well balanced, divide in 3 parts"
- 	third := yLen + 2 // 3 bitClear: 2r11.
- 	x2 := self butLowestNDigits: third * 2.
- 	x1 := self copyDigitsFrom: third + 1 to: third * 2.
- 	x0 := self lowestNDigits: third.
- 	y2 := anInteger butLowestNDigits: third * 2.
- 	y1 := anInteger copyDigitsFrom: third + 1 to: third * 2.
- 	y0 := anInteger lowestNDigits: third.
- 	
- 	"Toom-3 trick: 5 multiplications instead of 9"
- 	z0 := x0 toom3Times: y0.
- 	z4 := x2 toom3Times: y2.
- 	x20 := x2 + x0.
- 	y20 := y2 + y0.
- 	z1 := x20 + x1 toom3Times: y20 + y1.
- 	x20 := x20 - x1.
- 	y20 := y20 - y1.
- 	z2 := x20 toom3Times: y20.
- 	z3 := (x20 + x2 bitShift: 1) - x0 toom3Times: (y20 + y2 bitShift: 1) - y0.
- 	
- 	"Sum the parts of decomposition"
- 	z3 := z3 - z1 quo: 3.
- 	z1 := z1 - z2 bitShift: -1.
- 	z2 := z2 - z0.
- 	
- 	z3 := (z2 - z3 bitShift: -1) + (z4 bitShift: 1).
- 	z2 := z2 + z1 - z4.
- 	z1 := z1 - z3.
- 	^z0 + (z1 bitShift: 8*third) + (z2 bitShift: 16*third) + (z3 + (z4 bitShift: 8*third) bitShift: 24*third)
- 	
- "
- | a b |
- a :=5000 factorial - 1.
- b := 6000 factorial - 1.
- a digitLength min: b digitLength.
- a digitLength max: b digitLength.
- (a toom3Times: b) = (a * b) ifFalse: [self error].
- [Smalltalk garbageCollect.
- [300 timesRepeat: [a toom3Times: b]] timeToRun] value /
- [Smalltalk garbageCollect.
- [300 timesRepeat: [a karatsubaTimes: b]] timeToRun] value asFloat
- "!



More information about the Packages mailing list