[squeak-dev] The Inbox: Kernel-nice.1218.mcz

Sat Apr 27 08:41:36 UTC 2019

Nicolas Cellier uploaded a new version of Kernel to project The Inbox:
http://source.squeak.org/inbox/Kernel-nice.1218.mcz

==================== Summary ====================

Name: Kernel-nice.1218
Author: nice
Time: 27 April 2019, 10:41:24.794539 am
UUID: 21f74fbe-a0cd-4b6f-86e9-be13465d57fe
Ancestors: Kernel-nice.1217

Implement the recursive fast division of Burnikel-Ziegler for large integers and connect it to digitDiv:neg: when operands are large enough.

This is not the fastest known division which is a composition of Barrett and Newton-Raphson inversion - but is easy to implement and should have similar performances for at least a few thousand bytes long integers - see for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-Hasselstrom2003.pdf 

Use digitDiv:neg: in large integer printString so as to obtain the quotient (head) and remainder (tail) in a single operation. Together with divide and conquer division, this results in a factor of about 3x for 50000 factorial printString.

Implement the 4-way Toom-Cook squaring variant of Chung-Hasan. This over-performs the symetrical squaredToom3 even for medium size (800 bytes).

=============== Diff against Kernel-nice.1217 ===============

Item was added:
+ ----- Method: Integer>>digitDiv21: (in category 'private') -----
+ digitDiv21: anInteger
+ 	
+ 	^(self digitDiv: anInteger neg: false) collect: #normalize!

Item was added:
+ ----- Method: Integer>>digitDiv32: (in category 'private') -----
+ digitDiv32: anInteger
+ 	
+ 	^(self digitDiv: anInteger neg: false) collect: #normalize!

Item was changed:
  ----- Method: Integer>>digitDiv:neg: (in category 'private') -----
+ digitDiv: anInteger neg: aBoolean 
+ 	^self primDigitDiv: anInteger neg: aBoolean!
- digitDiv: arg neg: ng 
- 	"Answer with an array of (quotient, remainder)."
- 	| quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
- 	<primitive: 'primDigitDivNegative' module:'LargeIntegers'>
- 	arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
- 	"TFEI added this line"
- 	l := self digitLength - arg digitLength + 1.
- 	l <= 0 ifTrue: [^ Array with: 0 with: self].
- 	"shortcut against #highBit"
- 	d := 8 - arg lastDigit highBitOfByte.
- 	div := arg digitLshift: d.
- 	divDigitLength := div digitLength + 1.
- 	div := div growto: divDigitLength.
- 	"shifts so high order word is >=128"
- 	rem := self digitLshift: d.
- 	rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
- 	remDigitLength := rem digitLength.
- 	"makes a copy and shifts"
- 	quo := Integer new: l neg: ng.
- 	dl := divDigitLength - 1.
- 	"Last actual byte of data"
- 	ql := l.
- 	dh := div digitAt: dl.
- 	dnh := dl = 1
- 				ifTrue: [0]
- 				ifFalse: [div digitAt: dl - 1].
- 	1 to: ql do: 
- 		[:k | 
- 		"maintain quo*arg+rem=self"
- 		"Estimate rem/div by dividing the leading to bytes of rem by dh."
- 		"The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
- 		j := remDigitLength + 1 - k.
- 		"r1 := rem digitAt: j."
- 		(rem digitAt: j)
- 			= dh
- 			ifTrue: [qhi := qlo := 15
- 				"i.e. q=255"]
- 			ifFalse: 
- 				["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.  
- 				Note that r1,r2 are bytes, not nibbles.  
- 				Be careful not to generate intermediate results exceeding 13  
- 				bits."
- 				"r2 := (rem digitAt: j - 1)."
- 				t := ((rem digitAt: j)
- 							bitShift: 4)
- 							+ ((rem digitAt: j - 1)
- 									bitShift: -4).
- 				qhi := t // dh.
- 				t := (t \\ dh bitShift: 4)
- 							+ ((rem digitAt: j - 1)
- 									bitAnd: 15).
- 				qlo := t // dh.
- 				t := t \\ dh.
- 				"Next compute (hi,lo) := q*dnh"
- 				hi := qhi * dnh.
- 				lo := qlo * dnh + ((hi bitAnd: 15)
- 								bitShift: 4).
- 				hi := (hi bitShift: -4)
- 							+ (lo bitShift: -8).
- 				lo := lo bitAnd: 255.
- 				"Correct overestimate of q.  
- 				Max of 2 iterations through loop -- see Knuth vol. 2"
- 				r3 := j < 3
- 							ifTrue: [0]
- 							ifFalse: [rem digitAt: j - 2].
- 				[(t < hi
- 					or: [t = hi and: [r3 < lo]])
- 					and: 
- 						["i.e. (t,r3) < (hi,lo)"
- 						qlo := qlo - 1.
- 						lo := lo - dnh.
- 						lo < 0
- 							ifTrue: 
- 								[hi := hi - 1.
- 								lo := lo + 256].
- 						hi >= dh]]
- 					whileTrue: [hi := hi - dh].
- 				qlo < 0
- 					ifTrue: 
- 						[qhi := qhi - 1.
- 						qlo := qlo + 16]].
- 		"Subtract q*div from rem"
- 		l := j - dl.
- 		a := 0.
- 		1 to: divDigitLength do: 
- 			[:i | 
- 			hi := (div digitAt: i)
- 						* qhi.
- 			lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
- 							bitShift: 4) - ((div digitAt: i)
- 							* qlo).
- 			rem digitAt: l put: lo - (lo // 256 * 256).
- 			"sign-tolerant form of (lo bitAnd: 255)"
- 			a := lo // 256 - (hi bitShift: -4).
- 			l := l + 1].
- 		a < 0
- 			ifTrue: 
- 				["Add div back into rem, decrease q by 1"
- 				qlo := qlo - 1.
- 				l := j - dl.
- 				a := 0.
- 				1 to: divDigitLength do: 
- 					[:i | 
- 					a := (a bitShift: -8)
- 								+ (rem digitAt: l) + (div digitAt: i).
- 					rem digitAt: l put: (a bitAnd: 255).
- 					l := l + 1]].
- 		quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
- 				+ qlo].
- 	rem := rem
- 				digitRshift: d
- 				bytes: 0
- 				lookfirst: dl.
- 	^ Array with: quo with: rem!

Item was added:
+ ----- Method: Integer>>primDigitDiv:neg: (in category 'private') -----
+ primDigitDiv: arg neg: ng 
+ 	"Answer with an array of (quotient, remainder)."
+ 	| quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
+ 	<primitive: 'primDigitDivNegative' module:'LargeIntegers'>
+ 	arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
+ 	"TFEI added this line"
+ 	l := self digitLength - arg digitLength + 1.
+ 	l <= 0 ifTrue: [^ Array with: 0 with: self].
+ 	"shortcut against #highBit"
+ 	d := 8 - arg lastDigit highBitOfByte.
+ 	div := arg digitLshift: d.
+ 	divDigitLength := div digitLength + 1.
+ 	div := div growto: divDigitLength.
+ 	"shifts so high order word is >=128"
+ 	rem := self digitLshift: d.
+ 	rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
+ 	remDigitLength := rem digitLength.
+ 	"makes a copy and shifts"
+ 	quo := Integer new: l neg: ng.
+ 	dl := divDigitLength - 1.
+ 	"Last actual byte of data"
+ 	ql := l.
+ 	dh := div digitAt: dl.
+ 	dnh := dl = 1
+ 				ifTrue: [0]
+ 				ifFalse: [div digitAt: dl - 1].
+ 	1 to: ql do: 
+ 		[:k | 
+ 		"maintain quo*arg+rem=self"
+ 		"Estimate rem/div by dividing the leading to bytes of rem by dh."
+ 		"The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
+ 		j := remDigitLength + 1 - k.
+ 		"r1 := rem digitAt: j."
+ 		(rem digitAt: j)
+ 			= dh
+ 			ifTrue: [qhi := qlo := 15
+ 				"i.e. q=255"]
+ 			ifFalse: 
+ 				["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.  
+ 				Note that r1,r2 are bytes, not nibbles.  
+ 				Be careful not to generate intermediate results exceeding 13  
+ 				bits."
+ 				"r2 := (rem digitAt: j - 1)."
+ 				t := ((rem digitAt: j)
+ 							bitShift: 4)
+ 							+ ((rem digitAt: j - 1)
+ 									bitShift: -4).
+ 				qhi := t // dh.
+ 				t := (t \\ dh bitShift: 4)
+ 							+ ((rem digitAt: j - 1)
+ 									bitAnd: 15).
+ 				qlo := t // dh.
+ 				t := t \\ dh.
+ 				"Next compute (hi,lo) := q*dnh"
+ 				hi := qhi * dnh.
+ 				lo := qlo * dnh + ((hi bitAnd: 15)
+ 								bitShift: 4).
+ 				hi := (hi bitShift: -4)
+ 							+ (lo bitShift: -8).
+ 				lo := lo bitAnd: 255.
+ 				"Correct overestimate of q.  
+ 				Max of 2 iterations through loop -- see Knuth vol. 2"
+ 				r3 := j < 3
+ 							ifTrue: [0]
+ 							ifFalse: [rem digitAt: j - 2].
+ 				[(t < hi
+ 					or: [t = hi and: [r3 < lo]])
+ 					and: 
+ 						["i.e. (t,r3) < (hi,lo)"
+ 						qlo := qlo - 1.
+ 						lo := lo - dnh.
+ 						lo < 0
+ 							ifTrue: 
+ 								[hi := hi - 1.
+ 								lo := lo + 256].
+ 						hi >= dh]]
+ 					whileTrue: [hi := hi - dh].
+ 				qlo < 0
+ 					ifTrue: 
+ 						[qhi := qhi - 1.
+ 						qlo := qlo + 16]].
+ 		"Subtract q*div from rem"
+ 		l := j - dl.
+ 		a := 0.
+ 		1 to: divDigitLength do: 
+ 			[:i | 
+ 			hi := (div digitAt: i)
+ 						* qhi.
+ 			lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
+ 							bitShift: 4) - ((div digitAt: i)
+ 							* qlo).
+ 			rem digitAt: l put: lo - (lo // 256 * 256).
+ 			"sign-tolerant form of (lo bitAnd: 255)"
+ 			a := lo // 256 - (hi bitShift: -4).
+ 			l := l + 1].
+ 		a < 0
+ 			ifTrue: 
+ 				["Add div back into rem, decrease q by 1"
+ 				qlo := qlo - 1.
+ 				l := j - dl.
+ 				a := 0.
+ 				1 to: divDigitLength do: 
+ 					[:i | 
+ 					a := (a bitShift: -8)
+ 								+ (rem digitAt: l) + (div digitAt: i).
+ 					rem digitAt: l put: (a bitAnd: 255).
+ 					l := l + 1]].
+ 		quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
+ 				+ qlo].
+ 	rem := rem
+ 				digitRshift: d
+ 				bytes: 0
+ 				lookfirst: dl.
+ 	^ Array with: quo with: rem!

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private') -----
+ digitDiv21: anInteger
+ 	"This is part of the recursive division algorithm from Burnikel - Ziegler
+ 	Divide a two limbs receiver by 1 limb dividend
+ 	Each limb is decomposed in two halves of p bytes (8*p bits)
+ 	so as to continue the recursion"
+ 	
+ 	| p qr1 qr2 |
+ 	p := anInteger digitLength + 1 bitShift: -1.
+ 	p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false) collect: #normalize].
+ 	qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
+ 	qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p) digitDiv32: anInteger.
+ 	qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
+ 	^qr2!

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') -----
+ digitDiv32: anInteger
+ 	"This is part of the recursive division algorithm from Burnikel - Ziegler
+ 	Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
+ 	Each limb is made of p bytes (8*p bits).
+ 	This step transforms the division problem into multiplication
+ 	It must use the fastMultiply: to be worth the overhead costs."
+ 	
+ 	| a2 b1 d p q qr r |
+ 	p := anInteger digitLength + 1 bitShift: -1.
+ 	(a2 := self butLowestNDigits: 2*p) 
+ 		< (b1 := anInteger butLowestNDigits: p)
+ 		ifTrue:
+ 			[qr := (self butLowestNDigits: p) digitDiv21: b1.
+ 			q := qr first.
+ 			r := qr last]
+ 		ifFalse:
+ 			[q := (1 bitShift: 8*p) - 1.
+ 			r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1].
+ 	d := q fastMultiply: (anInteger lowestNDigits: p).
+ 	r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
+ 	[r < 0]
+ 		whileTrue:
+ 			[q := q - 1.
+ 			r := r + anInteger].
+ 	^Array with: q with: r
+ 	!

Item was added:
+ ----- Method: LargePositiveInteger>>digitDiv:neg: (in category 'private') -----
+ digitDiv: anInteger neg: aBoolean 
+ 	"If length is worth, engage a recursive divide and conquer strategy"
+ 	| qr |
+ 	(anInteger digitLength <= 256
+ 			or: [self digitLength <= anInteger digitLength])
+ 		ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean].
+ 	qr := self abs recursiveDigitDiv: anInteger abs.
+ 	^ aBoolean
+ 		ifTrue: [qr collect: #negated]
+ 		ifFalse: [qr]!

Item was changed:
  ----- Method: LargePositiveInteger>>printOn:base: (in category 'printing') -----
  printOn: aStream base: b
  	"Append a representation of this number in base b on aStream.
  	In order to reduce cost of LargePositiveInteger ops, split the number in approximately two equal parts in number of digits."

+ 	| halfDigits halfPower head tail nDigitsUnderestimate qr |
- 	| halfDigits halfPower head tail nDigitsUnderestimate |
  	"Don't engage any arithmetic if not normalized"
  	(self digitLength = 0 or: [(self digitAt: self digitLength) = 0]) ifTrue: [^self normalize printOn: aStream base: b].

  	nDigitsUnderestimate := b = 10
  		ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is because (2 log)/(10 log)*4096 is slightly greater than 1233"]
  		ifFalse: [self highBit quo: b highBit].

  	"splitting digits with a whole power of two is more efficient"
  	halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2.

  	halfDigits <= 1
  		ifTrue: ["Hmmm, this could happen only in case of a huge base b... Let lower level fail"
  			^self printOn: aStream base: b nDigits: (self numberOfDigitsInBase: b)].

  	"Separate in two halves, head and tail"
  	halfPower := b raisedToInteger: halfDigits.
+ 	qr := self digitDiv: halfPower neg: self negative.
+ 	head := qr first normalize.
+ 	tail := qr last normalize.
- 	head := self quo: halfPower.
- 	tail := self - (head * halfPower).

  	"print head"
  	head printOn: aStream base: b.

  	"print tail without the overhead to count the digits"
  	tail printOn: aStream base: b nDigits: halfDigits!

Item was changed:
  ----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category 'printing') -----
  printOn: aStream base: b nDigits: n
  	"Append a representation of this number in base b on aStream using n digits.
  	In order to reduce cost of LargePositiveInteger ops, split the number of digts approximatily in two
  	Should be invoked with: 0 <= self < (b raisedToInteger: n)"

+ 	| halfPower half head tail qr |
- 	| halfPower half head tail |
  	n <= 1 ifTrue: [
  		n <= 0 ifTrue: [self error: 'Number of digits n should be > 0'].

  		"Note: this is to stop an infinite loop if one ever attempts to print with a huge base
  		This can happen because choice was to not hardcode any limit for base b
  		We let Character>>#digitValue: fail"
  		^aStream nextPut: (Character digitValue: self) ].
  	halfPower := n bitShift: -1.
  	half := b raisedToInteger: halfPower.
+ 	qr := self digitDiv: half neg: self negative.
+ 	head := qr first normalize.
+ 	tail := qr last normalize.
- 	head := self quo: half.
- 	tail := self - (head * half).
  	head printOn: aStream base: b nDigits: n - halfPower.
  	tail printOn: aStream base: b nDigits: halfPower!

Item was added:
+ ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') -----
+ recursiveDigitDiv: anInteger
+ 	"This is the recursive division algorithm from Burnikel - Ziegler
+ 	See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
+ 	Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
+ 	https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
+ 	
+ 	| s m t a b z qr q i |
+ 	"round digits up to next power of 2"
+ 	s := anInteger digitLength.
+ 	m := 1 bitShift: (s - 1) highBit.
+ 	"shift so that leading bit of leading byte be 1, and digitLength power of two"
+ 	s := m * 8 - anInteger highBit.
+ 	a := self bitShift: s.
+ 	b := anInteger bitShift: s.
+ 	
+ 	"Decompose a into t limbs - each limb have m bytes
+ 	choose t such that leading bit of leading limb of a be 0"
+ 	t := (a highBit + 1 / (m * 8)) ceiling.
+ 	z := a butLowestNDigits: t - 2 * m.
+ 	i := t - 2.
+ 	q := 0.
+ 	"and do a division of two limb by 1 limb b for each pair of limb of a"
+ 	[qr := z digitDiv21: b.
+ 	q := (q bitShift: 8*m) + qr first.	"Note: this naive recomposition of q cost O(t^2) - it is possible in O(t log(t))"
+ 	(i := i - 1) >= 0] whileTrue:
+ 		[z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m + 1 to: i + 1 * m)].
+ 	^Array with: q with: (qr last bitShift: s negated)!

Item was changed:
  ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') -----
  sqrtRem
  	"Like super, but use a divide and conquer method to perform this operation.
  	See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854>
  	https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"

+ 	| n  qr q s r sr high mid low |
- 	| n  qr s r sr high mid low |
  	n := self digitLength bitShift: -2.
  	n >= 16 ifFalse: [^super sqrtRem].
  	high := self butLowestNDigits: n * 2.
  	mid := self copyDigitsFrom: n + 1 to: n * 2.
  	low := self lowestNDigits: n.

  	sr := high sqrtRem.
  	qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false.
+ 	q := qr first normalize.
+ 	s := (sr first bitShift: 8 * n) + q.
+ 	r := (qr last normalize bitShift: 8 * n) + low - q squared.
- 	s := (sr first bitShift: 8 * n) + qr first.
- 	r := (qr last bitShift: 8 * n) + low - qr first squared.
  	r negative
  		ifTrue:
  			[r := (s bitShift: 1) + r - 1.
  			s := s - 1].
  	sr at: 1 put: s; at: 2 put: r.
  	^sr
  	!

Item was changed:
  ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') -----
  squared
  	"Eventually use a divide and conquer algorithm to perform the multiplication"

  	(self digitLength >= 400) ifFalse: [^self * self].
+ 	(self digitLength >= 800) ifFalse: [^self squaredKaratsuba].
+ 	^self squaredToom4!
- 	(self digitLength >= 1600) ifFalse: [^self squaredKaratsuba].
- 	^self squaredToom3!

Item was added:
+ ----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') -----
+ squaredToom4
+ 	"Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
+ 	See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
+ 	https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
+ 	
+ 	| p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
+ 	"divide in 4 parts"
+ 	p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
+ 	a3 := self butLowestNDigits: p * 3.
+ 	a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
+ 	a1 := self copyDigitsFrom: p + 1 to: p * 2.
+ 	a0 := self lowestNDigits: p.
+ 	
+ 	"Toom-4 trick: 7 multiplications instead of 16"
+ 	a02 := a0 - a2.
+ 	a13 := a1 - a3.
+ 	s0 := a0 squared.
+ 	s1 := (a0 fastMultiply: a1) bitShift: 1.
+ 	s2 := (a02 + a13) fastMultiply: (a02 - a13).
+ 	s3 := ((a0 + a1) + (a2 + a3)) squared.
+ 	s4 := (a02 fastMultiply: a13) bitShift: 1.
+ 	s5 := (a3 fastMultiply: a2) bitShift: 1.
+ 	s6 := a3 squared.
+ 	
+ 	"Interpolation"
+ 	t2 := s1 + s5.
+ 	t3 := (s2 + s3 + s4 bitShift: -1) - t2.
+ 	s3 := t2 - s4.
+ 	s4 := t3 - s0.
+ 	s2 := t3 - s2 - s6.
+ 	
+ 	"Sum the parts of decomposition"
+ 	^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
+ 	+(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
+ 	
+ "
+ | a |
+ a := 770 factorial-1.
+ a digitLength.
+ [a * a - a squaredToom4 = 0] assert.
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
+ [Smalltalk garbageCollect.
+ [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
+ "!