Nicolas Cellier uploaded a new version of Kernel to project The Inbox: http://source.squeak.org/inbox/Kernel-nice.1218.mcz
==================== Summary ====================
Name: Kernel-nice.1218 Author: nice Time: 27 April 2019, 10:41:24.794539 am UUID: 21f74fbe-a0cd-4b6f-86e9-be13465d57fe Ancestors: Kernel-nice.1217
Implement the recursive fast division of Burnikel-Ziegler for large integers and connect it to digitDiv:neg: when operands are large enough.
This is not the fastest known division which is a composition of Barrett and Newton-Raphson inversion - but is easy to implement and should have similar performances for at least a few thousand bytes long integers - see for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-H...
Use digitDiv:neg: in large integer printString so as to obtain the quotient (head) and remainder (tail) in a single operation. Together with divide and conquer division, this results in a factor of about 3x for 50000 factorial printString.
Implement the 4-way Toom-Cook squaring variant of Chung-Hasan. This over-performs the symetrical squaredToom3 even for medium size (800 bytes).
=============== Diff against Kernel-nice.1217 ===============
Item was added: + ----- Method: Integer>>digitDiv21: (in category 'private') ----- + digitDiv21: anInteger + + ^(self digitDiv: anInteger neg: false) collect: #normalize!
Item was added: + ----- Method: Integer>>digitDiv32: (in category 'private') ----- + digitDiv32: anInteger + + ^(self digitDiv: anInteger neg: false) collect: #normalize!
Item was changed: ----- Method: Integer>>digitDiv:neg: (in category 'private') ----- + digitDiv: anInteger neg: aBoolean + ^self primDigitDiv: anInteger neg: aBoolean! - digitDiv: arg neg: ng - "Answer with an array of (quotient, remainder)." - | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength | - <primitive: 'primDigitDivNegative' module:'LargeIntegers'> - arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal]. - "TFEI added this line" - l := self digitLength - arg digitLength + 1. - l <= 0 ifTrue: [^ Array with: 0 with: self]. - "shortcut against #highBit" - d := 8 - arg lastDigit highBitOfByte. - div := arg digitLshift: d. - divDigitLength := div digitLength + 1. - div := div growto: divDigitLength. - "shifts so high order word is >=128" - rem := self digitLshift: d. - rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1]. - remDigitLength := rem digitLength. - "makes a copy and shifts" - quo := Integer new: l neg: ng. - dl := divDigitLength - 1. - "Last actual byte of data" - ql := l. - dh := div digitAt: dl. - dnh := dl = 1 - ifTrue: [0] - ifFalse: [div digitAt: dl - 1]. - 1 to: ql do: - [:k | - "maintain quo*arg+rem=self" - "Estimate rem/div by dividing the leading to bytes of rem by dh." - "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles." - j := remDigitLength + 1 - k. - "r1 := rem digitAt: j." - (rem digitAt: j) - = dh - ifTrue: [qhi := qlo := 15 - "i.e. q=255"] - ifFalse: - ["Compute q = (r1,r2)//dh, t = (r1,r2)\dh. - Note that r1,r2 are bytes, not nibbles. - Be careful not to generate intermediate results exceeding 13 - bits." - "r2 := (rem digitAt: j - 1)." - t := ((rem digitAt: j) - bitShift: 4) - + ((rem digitAt: j - 1) - bitShift: -4). - qhi := t // dh. - t := (t \ dh bitShift: 4) - + ((rem digitAt: j - 1) - bitAnd: 15). - qlo := t // dh. - t := t \ dh. - "Next compute (hi,lo) := q*dnh" - hi := qhi * dnh. - lo := qlo * dnh + ((hi bitAnd: 15) - bitShift: 4). - hi := (hi bitShift: -4) - + (lo bitShift: -8). - lo := lo bitAnd: 255. - "Correct overestimate of q. - Max of 2 iterations through loop -- see Knuth vol. 2" - r3 := j < 3 - ifTrue: [0] - ifFalse: [rem digitAt: j - 2]. - [(t < hi - or: [t = hi and: [r3 < lo]]) - and: - ["i.e. (t,r3) < (hi,lo)" - qlo := qlo - 1. - lo := lo - dnh. - lo < 0 - ifTrue: - [hi := hi - 1. - lo := lo + 256]. - hi >= dh]] - whileTrue: [hi := hi - dh]. - qlo < 0 - ifTrue: - [qhi := qhi - 1. - qlo := qlo + 16]]. - "Subtract q*div from rem" - l := j - dl. - a := 0. - 1 to: divDigitLength do: - [:i | - hi := (div digitAt: i) - * qhi. - lo := a + (rem digitAt: l) - ((hi bitAnd: 15) - bitShift: 4) - ((div digitAt: i) - * qlo). - rem digitAt: l put: lo - (lo // 256 * 256). - "sign-tolerant form of (lo bitAnd: 255)" - a := lo // 256 - (hi bitShift: -4). - l := l + 1]. - a < 0 - ifTrue: - ["Add div back into rem, decrease q by 1" - qlo := qlo - 1. - l := j - dl. - a := 0. - 1 to: divDigitLength do: - [:i | - a := (a bitShift: -8) - + (rem digitAt: l) + (div digitAt: i). - rem digitAt: l put: (a bitAnd: 255). - l := l + 1]]. - quo digitAt: ql + 1 - k put: (qhi bitShift: 4) - + qlo]. - rem := rem - digitRshift: d - bytes: 0 - lookfirst: dl. - ^ Array with: quo with: rem!
Item was added: + ----- Method: Integer>>primDigitDiv:neg: (in category 'private') ----- + primDigitDiv: arg neg: ng + "Answer with an array of (quotient, remainder)." + | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength | + <primitive: 'primDigitDivNegative' module:'LargeIntegers'> + arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal]. + "TFEI added this line" + l := self digitLength - arg digitLength + 1. + l <= 0 ifTrue: [^ Array with: 0 with: self]. + "shortcut against #highBit" + d := 8 - arg lastDigit highBitOfByte. + div := arg digitLshift: d. + divDigitLength := div digitLength + 1. + div := div growto: divDigitLength. + "shifts so high order word is >=128" + rem := self digitLshift: d. + rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1]. + remDigitLength := rem digitLength. + "makes a copy and shifts" + quo := Integer new: l neg: ng. + dl := divDigitLength - 1. + "Last actual byte of data" + ql := l. + dh := div digitAt: dl. + dnh := dl = 1 + ifTrue: [0] + ifFalse: [div digitAt: dl - 1]. + 1 to: ql do: + [:k | + "maintain quo*arg+rem=self" + "Estimate rem/div by dividing the leading to bytes of rem by dh." + "The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles." + j := remDigitLength + 1 - k. + "r1 := rem digitAt: j." + (rem digitAt: j) + = dh + ifTrue: [qhi := qlo := 15 + "i.e. q=255"] + ifFalse: + ["Compute q = (r1,r2)//dh, t = (r1,r2)\dh. + Note that r1,r2 are bytes, not nibbles. + Be careful not to generate intermediate results exceeding 13 + bits." + "r2 := (rem digitAt: j - 1)." + t := ((rem digitAt: j) + bitShift: 4) + + ((rem digitAt: j - 1) + bitShift: -4). + qhi := t // dh. + t := (t \ dh bitShift: 4) + + ((rem digitAt: j - 1) + bitAnd: 15). + qlo := t // dh. + t := t \ dh. + "Next compute (hi,lo) := q*dnh" + hi := qhi * dnh. + lo := qlo * dnh + ((hi bitAnd: 15) + bitShift: 4). + hi := (hi bitShift: -4) + + (lo bitShift: -8). + lo := lo bitAnd: 255. + "Correct overestimate of q. + Max of 2 iterations through loop -- see Knuth vol. 2" + r3 := j < 3 + ifTrue: [0] + ifFalse: [rem digitAt: j - 2]. + [(t < hi + or: [t = hi and: [r3 < lo]]) + and: + ["i.e. (t,r3) < (hi,lo)" + qlo := qlo - 1. + lo := lo - dnh. + lo < 0 + ifTrue: + [hi := hi - 1. + lo := lo + 256]. + hi >= dh]] + whileTrue: [hi := hi - dh]. + qlo < 0 + ifTrue: + [qhi := qhi - 1. + qlo := qlo + 16]]. + "Subtract q*div from rem" + l := j - dl. + a := 0. + 1 to: divDigitLength do: + [:i | + hi := (div digitAt: i) + * qhi. + lo := a + (rem digitAt: l) - ((hi bitAnd: 15) + bitShift: 4) - ((div digitAt: i) + * qlo). + rem digitAt: l put: lo - (lo // 256 * 256). + "sign-tolerant form of (lo bitAnd: 255)" + a := lo // 256 - (hi bitShift: -4). + l := l + 1]. + a < 0 + ifTrue: + ["Add div back into rem, decrease q by 1" + qlo := qlo - 1. + l := j - dl. + a := 0. + 1 to: divDigitLength do: + [:i | + a := (a bitShift: -8) + + (rem digitAt: l) + (div digitAt: i). + rem digitAt: l put: (a bitAnd: 255). + l := l + 1]]. + quo digitAt: ql + 1 - k put: (qhi bitShift: 4) + + qlo]. + rem := rem + digitRshift: d + bytes: 0 + lookfirst: dl. + ^ Array with: quo with: rem!
Item was added: + ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private') ----- + digitDiv21: anInteger + "This is part of the recursive division algorithm from Burnikel - Ziegler + Divide a two limbs receiver by 1 limb dividend + Each limb is decomposed in two halves of p bytes (8*p bits) + so as to continue the recursion" + + | p qr1 qr2 | + p := anInteger digitLength + 1 bitShift: -1. + p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false) collect: #normalize]. + qr1 := (self butLowestNDigits: p) digitDiv32: anInteger. + qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p) digitDiv32: anInteger. + qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p). + ^qr2!
Item was added: + ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') ----- + digitDiv32: anInteger + "This is part of the recursive division algorithm from Burnikel - Ziegler + Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0). + Each limb is made of p bytes (8*p bits). + This step transforms the division problem into multiplication + It must use the fastMultiply: to be worth the overhead costs." + + | a2 b1 d p q qr r | + p := anInteger digitLength + 1 bitShift: -1. + (a2 := self butLowestNDigits: 2*p) + < (b1 := anInteger butLowestNDigits: p) + ifTrue: + [qr := (self butLowestNDigits: p) digitDiv21: b1. + q := qr first. + r := qr last] + ifFalse: + [q := (1 bitShift: 8*p) - 1. + r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1]. + d := q fastMultiply: (anInteger lowestNDigits: p). + r := (self lowestNDigits: p) + (r bitShift: 8*p) - d. + [r < 0] + whileTrue: + [q := q - 1. + r := r + anInteger]. + ^Array with: q with: r + !
Item was added: + ----- Method: LargePositiveInteger>>digitDiv:neg: (in category 'private') ----- + digitDiv: anInteger neg: aBoolean + "If length is worth, engage a recursive divide and conquer strategy" + | qr | + (anInteger digitLength <= 256 + or: [self digitLength <= anInteger digitLength]) + ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean]. + qr := self abs recursiveDigitDiv: anInteger abs. + ^ aBoolean + ifTrue: [qr collect: #negated] + ifFalse: [qr]!
Item was changed: ----- Method: LargePositiveInteger>>printOn:base: (in category 'printing') ----- printOn: aStream base: b "Append a representation of this number in base b on aStream. In order to reduce cost of LargePositiveInteger ops, split the number in approximately two equal parts in number of digits." + | halfDigits halfPower head tail nDigitsUnderestimate qr | - | halfDigits halfPower head tail nDigitsUnderestimate | "Don't engage any arithmetic if not normalized" (self digitLength = 0 or: [(self digitAt: self digitLength) = 0]) ifTrue: [^self normalize printOn: aStream base: b]. nDigitsUnderestimate := b = 10 ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is because (2 log)/(10 log)*4096 is slightly greater than 1233"] ifFalse: [self highBit quo: b highBit]. "splitting digits with a whole power of two is more efficient" halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2. halfDigits <= 1 ifTrue: ["Hmmm, this could happen only in case of a huge base b... Let lower level fail" ^self printOn: aStream base: b nDigits: (self numberOfDigitsInBase: b)]. "Separate in two halves, head and tail" halfPower := b raisedToInteger: halfDigits. + qr := self digitDiv: halfPower neg: self negative. + head := qr first normalize. + tail := qr last normalize. - head := self quo: halfPower. - tail := self - (head * halfPower). "print head" head printOn: aStream base: b. "print tail without the overhead to count the digits" tail printOn: aStream base: b nDigits: halfDigits!
Item was changed: ----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category 'printing') ----- printOn: aStream base: b nDigits: n "Append a representation of this number in base b on aStream using n digits. In order to reduce cost of LargePositiveInteger ops, split the number of digts approximatily in two Should be invoked with: 0 <= self < (b raisedToInteger: n)" + | halfPower half head tail qr | - | halfPower half head tail | n <= 1 ifTrue: [ n <= 0 ifTrue: [self error: 'Number of digits n should be > 0']. "Note: this is to stop an infinite loop if one ever attempts to print with a huge base This can happen because choice was to not hardcode any limit for base b We let Character>>#digitValue: fail" ^aStream nextPut: (Character digitValue: self) ]. halfPower := n bitShift: -1. half := b raisedToInteger: halfPower. + qr := self digitDiv: half neg: self negative. + head := qr first normalize. + tail := qr last normalize. - head := self quo: half. - tail := self - (head * half). head printOn: aStream base: b nDigits: n - halfPower. tail printOn: aStream base: b nDigits: halfPower!
Item was added: + ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') ----- + recursiveDigitDiv: anInteger + "This is the recursive division algorithm from Burnikel - Ziegler + See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler + Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998 + https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content" + + | s m t a b z qr q i | + "round digits up to next power of 2" + s := anInteger digitLength. + m := 1 bitShift: (s - 1) highBit. + "shift so that leading bit of leading byte be 1, and digitLength power of two" + s := m * 8 - anInteger highBit. + a := self bitShift: s. + b := anInteger bitShift: s. + + "Decompose a into t limbs - each limb have m bytes + choose t such that leading bit of leading limb of a be 0" + t := (a highBit + 1 / (m * 8)) ceiling. + z := a butLowestNDigits: t - 2 * m. + i := t - 2. + q := 0. + "and do a division of two limb by 1 limb b for each pair of limb of a" + [qr := z digitDiv21: b. + q := (q bitShift: 8*m) + qr first. "Note: this naive recomposition of q cost O(t^2) - it is possible in O(t log(t))" + (i := i - 1) >= 0] whileTrue: + [z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m + 1 to: i + 1 * m)]. + ^Array with: q with: (qr last bitShift: s negated)!
Item was changed: ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') ----- sqrtRem "Like super, but use a divide and conquer method to perform this operation. See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854> https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf" + | n qr q s r sr high mid low | - | n qr s r sr high mid low | n := self digitLength bitShift: -2. n >= 16 ifFalse: [^super sqrtRem]. high := self butLowestNDigits: n * 2. mid := self copyDigitsFrom: n + 1 to: n * 2. low := self lowestNDigits: n. sr := high sqrtRem. qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false. + q := qr first normalize. + s := (sr first bitShift: 8 * n) + q. + r := (qr last normalize bitShift: 8 * n) + low - q squared. - s := (sr first bitShift: 8 * n) + qr first. - r := (qr last bitShift: 8 * n) + low - qr first squared. r negative ifTrue: [r := (s bitShift: 1) + r - 1. s := s - 1]. sr at: 1 put: s; at: 2 put: r. ^sr !
Item was changed: ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') ----- squared "Eventually use a divide and conquer algorithm to perform the multiplication"
(self digitLength >= 400) ifFalse: [^self * self]. + (self digitLength >= 800) ifFalse: [^self squaredKaratsuba]. + ^self squaredToom4! - (self digitLength >= 1600) ifFalse: [^self squaredKaratsuba]. - ^self squaredToom3!
Item was added: + ----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') ----- + squaredToom4 + "Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication. + See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan + https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf" + + | p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 | + "divide in 4 parts" + p := (self digitLength + 3 bitShift: -2) bitClear: 2r11. + a3 := self butLowestNDigits: p * 3. + a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3. + a1 := self copyDigitsFrom: p + 1 to: p * 2. + a0 := self lowestNDigits: p. + + "Toom-4 trick: 7 multiplications instead of 16" + a02 := a0 - a2. + a13 := a1 - a3. + s0 := a0 squared. + s1 := (a0 fastMultiply: a1) bitShift: 1. + s2 := (a02 + a13) fastMultiply: (a02 - a13). + s3 := ((a0 + a1) + (a2 + a3)) squared. + s4 := (a02 fastMultiply: a13) bitShift: 1. + s5 := (a3 fastMultiply: a2) bitShift: 1. + s6 := a3 squared. + + "Interpolation" + t2 := s1 + s5. + t3 := (s2 + s3 + s4 bitShift: -1) - t2. + s3 := t2 - s4. + s4 := t3 - s0. + s2 := t3 - s2 - s6. + + "Sum the parts of decomposition" + ^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p) + +(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p) + + " + | a | + a := 770 factorial-1. + a digitLength. + [a * a - a squaredToom4 = 0] assert. + [Smalltalk garbageCollect. + [1000 timesRepeat: [a squaredToom4]] timeToRun] value / + [Smalltalk garbageCollect. + [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat + "!
Err, I messed up with the quo/rem signs... Tests pass, but there are not enough tests!
Le sam. 27 avr. 2019 à 10:41, commits@source.squeak.org a écrit :
Nicolas Cellier uploaded a new version of Kernel to project The Inbox: http://source.squeak.org/inbox/Kernel-nice.1218.mcz
==================== Summary ====================
Name: Kernel-nice.1218 Author: nice Time: 27 April 2019, 10:41:24.794539 am UUID: 21f74fbe-a0cd-4b6f-86e9-be13465d57fe Ancestors: Kernel-nice.1217
Implement the recursive fast division of Burnikel-Ziegler for large integers and connect it to digitDiv:neg: when operands are large enough.
This is not the fastest known division which is a composition of Barrett and Newton-Raphson inversion - but is easy to implement and should have similar performances for at least a few thousand bytes long integers - see for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-H...
Use digitDiv:neg: in large integer printString so as to obtain the quotient (head) and remainder (tail) in a single operation. Together with divide and conquer division, this results in a factor of about 3x for 50000 factorial printString.
Implement the 4-way Toom-Cook squaring variant of Chung-Hasan. This over-performs the symetrical squaredToom3 even for medium size (800 bytes).
=============== Diff against Kernel-nice.1217 ===============
Item was added:
- ----- Method: Integer>>digitDiv21: (in category 'private') -----
- digitDiv21: anInteger
^(self digitDiv: anInteger neg: false) collect: #normalize!
Item was added:
- ----- Method: Integer>>digitDiv32: (in category 'private') -----
- digitDiv32: anInteger
^(self digitDiv: anInteger neg: false) collect: #normalize!
Item was changed: ----- Method: Integer>>digitDiv:neg: (in category 'private') -----
- digitDiv: anInteger neg: aBoolean
^self primDigitDiv: anInteger neg: aBoolean!
- digitDiv: arg neg: ng
"Answer with an array of (quotient, remainder)."
| quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t
divDigitLength remDigitLength |
<primitive: 'primDigitDivNegative' module:'LargeIntegers'>
arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
"TFEI added this line"
l := self digitLength - arg digitLength + 1.
l <= 0 ifTrue: [^ Array with: 0 with: self].
"shortcut against #highBit"
d := 8 - arg lastDigit highBitOfByte.
div := arg digitLshift: d.
divDigitLength := div digitLength + 1.
div := div growto: divDigitLength.
"shifts so high order word is >=128"
rem := self digitLshift: d.
rem digitLength = self digitLength ifTrue: [rem := rem growto:
self digitLength + 1].
remDigitLength := rem digitLength.
"makes a copy and shifts"
quo := Integer new: l neg: ng.
dl := divDigitLength - 1.
"Last actual byte of data"
ql := l.
dh := div digitAt: dl.
dnh := dl = 1
ifTrue: [0]
ifFalse: [div digitAt: dl - 1].
1 to: ql do:
[:k |
"maintain quo*arg+rem=self"
"Estimate rem/div by dividing the leading to bytes of rem
by dh."
"The estimate is q = qhi*16+qlo, where qhi and qlo are
nibbles."
j := remDigitLength + 1 - k.
"r1 := rem digitAt: j."
(rem digitAt: j)
= dh
ifTrue: [qhi := qlo := 15
"i.e. q=255"]
ifFalse:
["Compute q = (r1,r2)//dh, t =
(r1,r2)\dh.
Note that r1,r2 are bytes, not nibbles.
Be careful not to generate intermediate
results exceeding 13
bits."
"r2 := (rem digitAt: j - 1)."
t := ((rem digitAt: j)
bitShift: 4)
+ ((rem digitAt: j
bitShift: -4).
qhi := t // dh.
t := (t \\ dh bitShift: 4)
+ ((rem digitAt: j
bitAnd: 15).
qlo := t // dh.
t := t \\ dh.
"Next compute (hi,lo) := q*dnh"
hi := qhi * dnh.
lo := qlo * dnh + ((hi bitAnd: 15)
bitShift:
4).
hi := (hi bitShift: -4)
+ (lo bitShift:
-8).
lo := lo bitAnd: 255.
"Correct overestimate of q.
Max of 2 iterations through loop -- see
Knuth vol. 2"
r3 := j < 3
ifTrue: [0]
ifFalse: [rem
digitAt: j - 2].
[(t < hi
or: [t = hi and: [r3 < lo]])
and:
["i.e. (t,r3) < (hi,lo)"
qlo := qlo - 1.
lo := lo - dnh.
lo < 0
ifTrue:
[hi := hi
lo := lo +
256].
hi >= dh]]
whileTrue: [hi := hi - dh].
qlo < 0
ifTrue:
[qhi := qhi - 1.
qlo := qlo + 16]].
"Subtract q*div from rem"
l := j - dl.
a := 0.
1 to: divDigitLength do:
[:i |
hi := (div digitAt: i)
* qhi.
lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
bitShift: 4) -
((div digitAt: i)
* qlo).
rem digitAt: l put: lo - (lo // 256 * 256).
"sign-tolerant form of (lo bitAnd: 255)"
a := lo // 256 - (hi bitShift: -4).
l := l + 1].
a < 0
ifTrue:
["Add div back into rem, decrease q by 1"
qlo := qlo - 1.
l := j - dl.
a := 0.
1 to: divDigitLength do:
[:i |
a := (a bitShift: -8)
+ (rem
digitAt: l) + (div digitAt: i).
rem digitAt: l put: (a bitAnd:
255).
l := l + 1]].
quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
+ qlo].
rem := rem
digitRshift: d
bytes: 0
lookfirst: dl.
^ Array with: quo with: rem!
Item was added:
- ----- Method: Integer>>primDigitDiv:neg: (in category 'private') -----
- primDigitDiv: arg neg: ng
"Answer with an array of (quotient, remainder)."
| quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t
divDigitLength remDigitLength |
<primitive: 'primDigitDivNegative' module:'LargeIntegers'>
arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
"TFEI added this line"
l := self digitLength - arg digitLength + 1.
l <= 0 ifTrue: [^ Array with: 0 with: self].
"shortcut against #highBit"
d := 8 - arg lastDigit highBitOfByte.
div := arg digitLshift: d.
divDigitLength := div digitLength + 1.
div := div growto: divDigitLength.
"shifts so high order word is >=128"
rem := self digitLshift: d.
rem digitLength = self digitLength ifTrue: [rem := rem growto:
self digitLength + 1].
remDigitLength := rem digitLength.
"makes a copy and shifts"
quo := Integer new: l neg: ng.
dl := divDigitLength - 1.
"Last actual byte of data"
ql := l.
dh := div digitAt: dl.
dnh := dl = 1
ifTrue: [0]
ifFalse: [div digitAt: dl - 1].
1 to: ql do:
[:k |
"maintain quo*arg+rem=self"
"Estimate rem/div by dividing the leading to bytes of rem
by dh."
"The estimate is q = qhi*16+qlo, where qhi and qlo are
nibbles."
j := remDigitLength + 1 - k.
"r1 := rem digitAt: j."
(rem digitAt: j)
= dh
ifTrue: [qhi := qlo := 15
"i.e. q=255"]
ifFalse:
["Compute q = (r1,r2)//dh, t =
(r1,r2)\dh.
Note that r1,r2 are bytes, not nibbles.
Be careful not to generate intermediate
results exceeding 13
bits."
"r2 := (rem digitAt: j - 1)."
t := ((rem digitAt: j)
bitShift: 4)
+ ((rem digitAt: j
bitShift: -4).
qhi := t // dh.
t := (t \\ dh bitShift: 4)
+ ((rem digitAt: j
bitAnd: 15).
qlo := t // dh.
t := t \\ dh.
"Next compute (hi,lo) := q*dnh"
hi := qhi * dnh.
lo := qlo * dnh + ((hi bitAnd: 15)
bitShift:
4).
hi := (hi bitShift: -4)
+ (lo bitShift:
-8).
lo := lo bitAnd: 255.
"Correct overestimate of q.
Max of 2 iterations through loop -- see
Knuth vol. 2"
r3 := j < 3
ifTrue: [0]
ifFalse: [rem
digitAt: j - 2].
[(t < hi
or: [t = hi and: [r3 < lo]])
and:
["i.e. (t,r3) < (hi,lo)"
qlo := qlo - 1.
lo := lo - dnh.
lo < 0
ifTrue:
[hi := hi
lo := lo +
256].
hi >= dh]]
whileTrue: [hi := hi - dh].
qlo < 0
ifTrue:
[qhi := qhi - 1.
qlo := qlo + 16]].
"Subtract q*div from rem"
l := j - dl.
a := 0.
1 to: divDigitLength do:
[:i |
hi := (div digitAt: i)
* qhi.
lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
bitShift: 4) -
((div digitAt: i)
* qlo).
rem digitAt: l put: lo - (lo // 256 * 256).
"sign-tolerant form of (lo bitAnd: 255)"
a := lo // 256 - (hi bitShift: -4).
l := l + 1].
a < 0
ifTrue:
["Add div back into rem, decrease q by 1"
qlo := qlo - 1.
l := j - dl.
a := 0.
1 to: divDigitLength do:
[:i |
a := (a bitShift: -8)
+ (rem
digitAt: l) + (div digitAt: i).
rem digitAt: l put: (a bitAnd:
255).
l := l + 1]].
quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
+ qlo].
rem := rem
digitRshift: d
bytes: 0
lookfirst: dl.
^ Array with: quo with: rem!
Item was added:
- ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private')
- digitDiv21: anInteger
"This is part of the recursive division algorithm from Burnikel -
Ziegler
Divide a two limbs receiver by 1 limb dividend
Each limb is decomposed in two halves of p bytes (8*p bits)
so as to continue the recursion"
| p qr1 qr2 |
p := anInteger digitLength + 1 bitShift: -1.
p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false)
collect: #normalize].
qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p)
digitDiv32: anInteger.
qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
^qr2!
Item was added:
- ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private')
- digitDiv32: anInteger
"This is part of the recursive division algorithm from Burnikel -
Ziegler
Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
Each limb is made of p bytes (8*p bits).
This step transforms the division problem into multiplication
It must use the fastMultiply: to be worth the overhead costs."
| a2 b1 d p q qr r |
p := anInteger digitLength + 1 bitShift: -1.
(a2 := self butLowestNDigits: 2*p)
< (b1 := anInteger butLowestNDigits: p)
ifTrue:
[qr := (self butLowestNDigits: p) digitDiv21: b1.
q := qr first.
r := qr last]
ifFalse:
[q := (1 bitShift: 8*p) - 1.
r := (self butLowestNDigits: p) - (b1 bitShift:
8*p) + b1].
d := q fastMultiply: (anInteger lowestNDigits: p).
r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
[r < 0]
whileTrue:
[q := q - 1.
r := r + anInteger].
^Array with: q with: r
!
Item was added:
- ----- Method: LargePositiveInteger>>digitDiv:neg: (in category
'private') -----
- digitDiv: anInteger neg: aBoolean
"If length is worth, engage a recursive divide and conquer
strategy"
| qr |
(anInteger digitLength <= 256
or: [self digitLength <= anInteger digitLength])
ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean].
qr := self abs recursiveDigitDiv: anInteger abs.
^ aBoolean
ifTrue: [qr collect: #negated]
ifFalse: [qr]!
Item was changed: ----- Method: LargePositiveInteger>>printOn:base: (in category 'printing') ----- printOn: aStream base: b "Append a representation of this number in base b on aStream. In order to reduce cost of LargePositiveInteger ops, split the number in approximately two equal parts in number of digits."
| halfDigits halfPower head tail nDigitsUnderestimate qr |
| halfDigits halfPower head tail nDigitsUnderestimate | "Don't engage any arithmetic if not normalized" (self digitLength = 0 or: [(self digitAt: self digitLength) = 0])
ifTrue: [^self normalize printOn: aStream base: b].
nDigitsUnderestimate := b = 10 ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is
because (2 log)/(10 log)*4096 is slightly greater than 1233"] ifFalse: [self highBit quo: b highBit].
"splitting digits with a whole power of two is more efficient" halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2. halfDigits <= 1 ifTrue: ["Hmmm, this could happen only in case of a huge
base b... Let lower level fail" ^self printOn: aStream base: b nDigits: (self numberOfDigitsInBase: b)].
"Separate in two halves, head and tail" halfPower := b raisedToInteger: halfDigits.
qr := self digitDiv: halfPower neg: self negative.
head := qr first normalize.
tail := qr last normalize.
head := self quo: halfPower.
tail := self - (head * halfPower). "print head" head printOn: aStream base: b. "print tail without the overhead to count the digits" tail printOn: aStream base: b nDigits: halfDigits!
Item was changed: ----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category 'printing') ----- printOn: aStream base: b nDigits: n "Append a representation of this number in base b on aStream using n digits. In order to reduce cost of LargePositiveInteger ops, split the number of digts approximatily in two Should be invoked with: 0 <= self < (b raisedToInteger: n)"
| halfPower half head tail qr |
| halfPower half head tail | n <= 1 ifTrue: [ n <= 0 ifTrue: [self error: 'Number of digits n should be
0'].
"Note: this is to stop an infinite loop if one ever
attempts to print with a huge base This can happen because choice was to not hardcode any limit for base b We let Character>>#digitValue: fail" ^aStream nextPut: (Character digitValue: self) ]. halfPower := n bitShift: -1. half := b raisedToInteger: halfPower.
qr := self digitDiv: half neg: self negative.
head := qr first normalize.
tail := qr last normalize.
head := self quo: half.
tail := self - (head * half). head printOn: aStream base: b nDigits: n - halfPower. tail printOn: aStream base: b nDigits: halfPower!
Item was added:
- ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category
'private') -----
- recursiveDigitDiv: anInteger
"This is the recursive division algorithm from Burnikel - Ziegler
See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content "
| s m t a b z qr q i |
"round digits up to next power of 2"
s := anInteger digitLength.
m := 1 bitShift: (s - 1) highBit.
"shift so that leading bit of leading byte be 1, and digitLength
power of two"
s := m * 8 - anInteger highBit.
a := self bitShift: s.
b := anInteger bitShift: s.
"Decompose a into t limbs - each limb have m bytes
choose t such that leading bit of leading limb of a be 0"
t := (a highBit + 1 / (m * 8)) ceiling.
z := a butLowestNDigits: t - 2 * m.
i := t - 2.
q := 0.
"and do a division of two limb by 1 limb b for each pair of limb
of a"
[qr := z digitDiv21: b.
q := (q bitShift: 8*m) + qr first. "Note: this naive
recomposition of q cost O(t^2) - it is possible in O(t log(t))"
(i := i - 1) >= 0] whileTrue:
[z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m +
1 to: i + 1 * m)].
^Array with: q with: (qr last bitShift: s negated)!
Item was changed: ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') ----- sqrtRem "Like super, but use a divide and conquer method to perform this operation. See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854> https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
| n qr q s r sr high mid low |
| n qr s r sr high mid low | n := self digitLength bitShift: -2. n >= 16 ifFalse: [^super sqrtRem]. high := self butLowestNDigits: n * 2. mid := self copyDigitsFrom: n + 1 to: n * 2. low := self lowestNDigits: n. sr := high sqrtRem. qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first
bitShift: 1) neg: false.
q := qr first normalize.
s := (sr first bitShift: 8 * n) + q.
r := (qr last normalize bitShift: 8 * n) + low - q squared.
s := (sr first bitShift: 8 * n) + qr first.
r := (qr last bitShift: 8 * n) + low - qr first squared. r negative ifTrue: [r := (s bitShift: 1) + r - 1. s := s - 1]. sr at: 1 put: s; at: 2 put: r. ^sr !
Item was changed: ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') ----- squared "Eventually use a divide and conquer algorithm to perform the multiplication"
(self digitLength >= 400) ifFalse: [^self * self].
(self digitLength >= 800) ifFalse: [^self squaredKaratsuba].
^self squaredToom4!
(self digitLength >= 1600) ifFalse: [^self squaredKaratsuba].
^self squaredToom3!
Item was added:
- ----- Method: LargePositiveInteger>>squaredToom4 (in category
'mathematical functions') -----
- squaredToom4
"Use a 4-way Toom-Cook divide and conquer algorithm to perform the
multiplication.
See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
| p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
"divide in 4 parts"
p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
a3 := self butLowestNDigits: p * 3.
a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
a1 := self copyDigitsFrom: p + 1 to: p * 2.
a0 := self lowestNDigits: p.
"Toom-4 trick: 7 multiplications instead of 16"
a02 := a0 - a2.
a13 := a1 - a3.
s0 := a0 squared.
s1 := (a0 fastMultiply: a1) bitShift: 1.
s2 := (a02 + a13) fastMultiply: (a02 - a13).
s3 := ((a0 + a1) + (a2 + a3)) squared.
s4 := (a02 fastMultiply: a13) bitShift: 1.
s5 := (a3 fastMultiply: a2) bitShift: 1.
s6 := a3 squared.
"Interpolation"
t2 := s1 + s5.
t3 := (s2 + s3 + s4 bitShift: -1) - t2.
s3 := t2 - s4.
s4 := t3 - s0.
s2 := t3 - s2 - s6.
"Sum the parts of decomposition"
^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
+(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
- "
- | a |
- a := 770 factorial-1.
- a digitLength.
- [a * a - a squaredToom4 = 0] assert.
- [Smalltalk garbageCollect.
- [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
- [Smalltalk garbageCollect.
- [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
- "!
Cool, it should be interesting to see if this will speed up MaHashIndex test suite.
On Sat, Apr 27, 2019 at 5:05 AM Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
Err, I messed up with the quo/rem signs... Tests pass, but there are not enough tests!
Le sam. 27 avr. 2019 à 10:41, commits@source.squeak.org a écrit :
Nicolas Cellier uploaded a new version of Kernel to project The Inbox: http://source.squeak.org/inbox/Kernel-nice.1218.mcz
==================== Summary ====================
Name: Kernel-nice.1218 Author: nice Time: 27 April 2019, 10:41:24.794539 am UUID: 21f74fbe-a0cd-4b6f-86e9-be13465d57fe Ancestors: Kernel-nice.1217
Implement the recursive fast division of Burnikel-Ziegler for large integers and connect it to digitDiv:neg: when operands are large enough.
This is not the fastest known division which is a composition of Barrett and Newton-Raphson inversion - but is easy to implement and should have similar performances for at least a few thousand bytes long integers - see for example http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-H...
Use digitDiv:neg: in large integer printString so as to obtain the quotient (head) and remainder (tail) in a single operation. Together with divide and conquer division, this results in a factor of about 3x for 50000 factorial printString.
Implement the 4-way Toom-Cook squaring variant of Chung-Hasan. This over-performs the symetrical squaredToom3 even for medium size (800 bytes).
=============== Diff against Kernel-nice.1217 ===============
Item was added:
- ----- Method: Integer>>digitDiv21: (in category 'private') -----
- digitDiv21: anInteger
^(self digitDiv: anInteger neg: false) collect: #normalize!
Item was added:
- ----- Method: Integer>>digitDiv32: (in category 'private') -----
- digitDiv32: anInteger
^(self digitDiv: anInteger neg: false) collect: #normalize!
Item was changed: ----- Method: Integer>>digitDiv:neg: (in category 'private') -----
- digitDiv: anInteger neg: aBoolean
^self primDigitDiv: anInteger neg: aBoolean!
- digitDiv: arg neg: ng
"Answer with an array of (quotient, remainder)."
| quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
<primitive: 'primDigitDivNegative' module:'LargeIntegers'>
arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
"TFEI added this line"
l := self digitLength - arg digitLength + 1.
l <= 0 ifTrue: [^ Array with: 0 with: self].
"shortcut against #highBit"
d := 8 - arg lastDigit highBitOfByte.
div := arg digitLshift: d.
divDigitLength := div digitLength + 1.
div := div growto: divDigitLength.
"shifts so high order word is >=128"
rem := self digitLshift: d.
rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
remDigitLength := rem digitLength.
"makes a copy and shifts"
quo := Integer new: l neg: ng.
dl := divDigitLength - 1.
"Last actual byte of data"
ql := l.
dh := div digitAt: dl.
dnh := dl = 1
ifTrue: [0]
ifFalse: [div digitAt: dl - 1].
1 to: ql do:
[:k |
"maintain quo*arg+rem=self"
"Estimate rem/div by dividing the leading to bytes of rem by dh."
"The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
j := remDigitLength + 1 - k.
"r1 := rem digitAt: j."
(rem digitAt: j)
= dh
ifTrue: [qhi := qlo := 15
"i.e. q=255"]
ifFalse:
["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.
Note that r1,r2 are bytes, not nibbles.
Be careful not to generate intermediate results exceeding 13
bits."
"r2 := (rem digitAt: j - 1)."
t := ((rem digitAt: j)
bitShift: 4)
+ ((rem digitAt: j - 1)
bitShift: -4).
qhi := t // dh.
t := (t \\ dh bitShift: 4)
+ ((rem digitAt: j - 1)
bitAnd: 15).
qlo := t // dh.
t := t \\ dh.
"Next compute (hi,lo) := q*dnh"
hi := qhi * dnh.
lo := qlo * dnh + ((hi bitAnd: 15)
bitShift: 4).
hi := (hi bitShift: -4)
+ (lo bitShift: -8).
lo := lo bitAnd: 255.
"Correct overestimate of q.
Max of 2 iterations through loop -- see Knuth vol. 2"
r3 := j < 3
ifTrue: [0]
ifFalse: [rem digitAt: j - 2].
[(t < hi
or: [t = hi and: [r3 < lo]])
and:
["i.e. (t,r3) < (hi,lo)"
qlo := qlo - 1.
lo := lo - dnh.
lo < 0
ifTrue:
[hi := hi - 1.
lo := lo + 256].
hi >= dh]]
whileTrue: [hi := hi - dh].
qlo < 0
ifTrue:
[qhi := qhi - 1.
qlo := qlo + 16]].
"Subtract q*div from rem"
l := j - dl.
a := 0.
1 to: divDigitLength do:
[:i |
hi := (div digitAt: i)
* qhi.
lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
bitShift: 4) - ((div digitAt: i)
* qlo).
rem digitAt: l put: lo - (lo // 256 * 256).
"sign-tolerant form of (lo bitAnd: 255)"
a := lo // 256 - (hi bitShift: -4).
l := l + 1].
a < 0
ifTrue:
["Add div back into rem, decrease q by 1"
qlo := qlo - 1.
l := j - dl.
a := 0.
1 to: divDigitLength do:
[:i |
a := (a bitShift: -8)
+ (rem digitAt: l) + (div digitAt: i).
rem digitAt: l put: (a bitAnd: 255).
l := l + 1]].
quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
+ qlo].
rem := rem
digitRshift: d
bytes: 0
lookfirst: dl.
^ Array with: quo with: rem!
Item was added:
- ----- Method: Integer>>primDigitDiv:neg: (in category 'private') -----
- primDigitDiv: arg neg: ng
"Answer with an array of (quotient, remainder)."
| quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t divDigitLength remDigitLength |
<primitive: 'primDigitDivNegative' module:'LargeIntegers'>
arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
"TFEI added this line"
l := self digitLength - arg digitLength + 1.
l <= 0 ifTrue: [^ Array with: 0 with: self].
"shortcut against #highBit"
d := 8 - arg lastDigit highBitOfByte.
div := arg digitLshift: d.
divDigitLength := div digitLength + 1.
div := div growto: divDigitLength.
"shifts so high order word is >=128"
rem := self digitLshift: d.
rem digitLength = self digitLength ifTrue: [rem := rem growto: self digitLength + 1].
remDigitLength := rem digitLength.
"makes a copy and shifts"
quo := Integer new: l neg: ng.
dl := divDigitLength - 1.
"Last actual byte of data"
ql := l.
dh := div digitAt: dl.
dnh := dl = 1
ifTrue: [0]
ifFalse: [div digitAt: dl - 1].
1 to: ql do:
[:k |
"maintain quo*arg+rem=self"
"Estimate rem/div by dividing the leading to bytes of rem by dh."
"The estimate is q = qhi*16+qlo, where qhi and qlo are nibbles."
j := remDigitLength + 1 - k.
"r1 := rem digitAt: j."
(rem digitAt: j)
= dh
ifTrue: [qhi := qlo := 15
"i.e. q=255"]
ifFalse:
["Compute q = (r1,r2)//dh, t = (r1,r2)\\dh.
Note that r1,r2 are bytes, not nibbles.
Be careful not to generate intermediate results exceeding 13
bits."
"r2 := (rem digitAt: j - 1)."
t := ((rem digitAt: j)
bitShift: 4)
+ ((rem digitAt: j - 1)
bitShift: -4).
qhi := t // dh.
t := (t \\ dh bitShift: 4)
+ ((rem digitAt: j - 1)
bitAnd: 15).
qlo := t // dh.
t := t \\ dh.
"Next compute (hi,lo) := q*dnh"
hi := qhi * dnh.
lo := qlo * dnh + ((hi bitAnd: 15)
bitShift: 4).
hi := (hi bitShift: -4)
+ (lo bitShift: -8).
lo := lo bitAnd: 255.
"Correct overestimate of q.
Max of 2 iterations through loop -- see Knuth vol. 2"
r3 := j < 3
ifTrue: [0]
ifFalse: [rem digitAt: j - 2].
[(t < hi
or: [t = hi and: [r3 < lo]])
and:
["i.e. (t,r3) < (hi,lo)"
qlo := qlo - 1.
lo := lo - dnh.
lo < 0
ifTrue:
[hi := hi - 1.
lo := lo + 256].
hi >= dh]]
whileTrue: [hi := hi - dh].
qlo < 0
ifTrue:
[qhi := qhi - 1.
qlo := qlo + 16]].
"Subtract q*div from rem"
l := j - dl.
a := 0.
1 to: divDigitLength do:
[:i |
hi := (div digitAt: i)
* qhi.
lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
bitShift: 4) - ((div digitAt: i)
* qlo).
rem digitAt: l put: lo - (lo // 256 * 256).
"sign-tolerant form of (lo bitAnd: 255)"
a := lo // 256 - (hi bitShift: -4).
l := l + 1].
a < 0
ifTrue:
["Add div back into rem, decrease q by 1"
qlo := qlo - 1.
l := j - dl.
a := 0.
1 to: divDigitLength do:
[:i |
a := (a bitShift: -8)
+ (rem digitAt: l) + (div digitAt: i).
rem digitAt: l put: (a bitAnd: 255).
l := l + 1]].
quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
+ qlo].
rem := rem
digitRshift: d
bytes: 0
lookfirst: dl.
^ Array with: quo with: rem!
Item was added:
- ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private') -----
- digitDiv21: anInteger
"This is part of the recursive division algorithm from Burnikel - Ziegler
Divide a two limbs receiver by 1 limb dividend
Each limb is decomposed in two halves of p bytes (8*p bits)
so as to continue the recursion"
| p qr1 qr2 |
p := anInteger digitLength + 1 bitShift: -1.
p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false) collect: #normalize].
qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p) digitDiv32: anInteger.
qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
^qr2!
Item was added:
- ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private') -----
- digitDiv32: anInteger
"This is part of the recursive division algorithm from Burnikel - Ziegler
Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
Each limb is made of p bytes (8*p bits).
This step transforms the division problem into multiplication
It must use the fastMultiply: to be worth the overhead costs."
| a2 b1 d p q qr r |
p := anInteger digitLength + 1 bitShift: -1.
(a2 := self butLowestNDigits: 2*p)
< (b1 := anInteger butLowestNDigits: p)
ifTrue:
[qr := (self butLowestNDigits: p) digitDiv21: b1.
q := qr first.
r := qr last]
ifFalse:
[q := (1 bitShift: 8*p) - 1.
r := (self butLowestNDigits: p) - (b1 bitShift: 8*p) + b1].
d := q fastMultiply: (anInteger lowestNDigits: p).
r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
[r < 0]
whileTrue:
[q := q - 1.
r := r + anInteger].
^Array with: q with: r
!
Item was added:
- ----- Method: LargePositiveInteger>>digitDiv:neg: (in category 'private') -----
- digitDiv: anInteger neg: aBoolean
"If length is worth, engage a recursive divide and conquer strategy"
| qr |
(anInteger digitLength <= 256
or: [self digitLength <= anInteger digitLength])
ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean].
qr := self abs recursiveDigitDiv: anInteger abs.
^ aBoolean
ifTrue: [qr collect: #negated]
ifFalse: [qr]!
Item was changed: ----- Method: LargePositiveInteger>>printOn:base: (in category 'printing') ----- printOn: aStream base: b "Append a representation of this number in base b on aStream. In order to reduce cost of LargePositiveInteger ops, split the number in approximately two equal parts in number of digits."
| halfDigits halfPower head tail nDigitsUnderestimate qr |
| halfDigits halfPower head tail nDigitsUnderestimate | "Don't engage any arithmetic if not normalized" (self digitLength = 0 or: [(self digitAt: self digitLength) = 0]) ifTrue: [^self normalize printOn: aStream base: b]. nDigitsUnderestimate := b = 10 ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is because (2 log)/(10 log)*4096 is slightly greater than 1233"] ifFalse: [self highBit quo: b highBit]. "splitting digits with a whole power of two is more efficient" halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2. halfDigits <= 1 ifTrue: ["Hmmm, this could happen only in case of a huge base b... Let lower level fail" ^self printOn: aStream base: b nDigits: (self numberOfDigitsInBase: b)]. "Separate in two halves, head and tail" halfPower := b raisedToInteger: halfDigits.
qr := self digitDiv: halfPower neg: self negative.
head := qr first normalize.
tail := qr last normalize.
head := self quo: halfPower.
tail := self - (head * halfPower). "print head" head printOn: aStream base: b. "print tail without the overhead to count the digits" tail printOn: aStream base: b nDigits: halfDigits!
Item was changed: ----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category 'printing') ----- printOn: aStream base: b nDigits: n "Append a representation of this number in base b on aStream using n digits. In order to reduce cost of LargePositiveInteger ops, split the number of digts approximatily in two Should be invoked with: 0 <= self < (b raisedToInteger: n)"
| halfPower half head tail qr |
| halfPower half head tail | n <= 1 ifTrue: [ n <= 0 ifTrue: [self error: 'Number of digits n should be > 0']. "Note: this is to stop an infinite loop if one ever attempts to print with a huge base This can happen because choice was to not hardcode any limit for base b We let Character>>#digitValue: fail" ^aStream nextPut: (Character digitValue: self) ]. halfPower := n bitShift: -1. half := b raisedToInteger: halfPower.
qr := self digitDiv: half neg: self negative.
head := qr first normalize.
tail := qr last normalize.
head := self quo: half.
tail := self - (head * half). head printOn: aStream base: b nDigits: n - halfPower. tail printOn: aStream base: b nDigits: halfPower!
Item was added:
- ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category 'private') -----
- recursiveDigitDiv: anInteger
"This is the recursive division algorithm from Burnikel - Ziegler
See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content"
| s m t a b z qr q i |
"round digits up to next power of 2"
s := anInteger digitLength.
m := 1 bitShift: (s - 1) highBit.
"shift so that leading bit of leading byte be 1, and digitLength power of two"
s := m * 8 - anInteger highBit.
a := self bitShift: s.
b := anInteger bitShift: s.
"Decompose a into t limbs - each limb have m bytes
choose t such that leading bit of leading limb of a be 0"
t := (a highBit + 1 / (m * 8)) ceiling.
z := a butLowestNDigits: t - 2 * m.
i := t - 2.
q := 0.
"and do a division of two limb by 1 limb b for each pair of limb of a"
[qr := z digitDiv21: b.
q := (q bitShift: 8*m) + qr first. "Note: this naive recomposition of q cost O(t^2) - it is possible in O(t log(t))"
(i := i - 1) >= 0] whileTrue:
[z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m + 1 to: i + 1 * m)].
^Array with: q with: (qr last bitShift: s negated)!
Item was changed: ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') ----- sqrtRem "Like super, but use a divide and conquer method to perform this operation. See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854> https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
| n qr q s r sr high mid low |
| n qr s r sr high mid low | n := self digitLength bitShift: -2. n >= 16 ifFalse: [^super sqrtRem]. high := self butLowestNDigits: n * 2. mid := self copyDigitsFrom: n + 1 to: n * 2. low := self lowestNDigits: n. sr := high sqrtRem. qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first bitShift: 1) neg: false.
q := qr first normalize.
s := (sr first bitShift: 8 * n) + q.
r := (qr last normalize bitShift: 8 * n) + low - q squared.
s := (sr first bitShift: 8 * n) + qr first.
r := (qr last bitShift: 8 * n) + low - qr first squared. r negative ifTrue: [r := (s bitShift: 1) + r - 1. s := s - 1]. sr at: 1 put: s; at: 2 put: r. ^sr !
Item was changed: ----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') ----- squared "Eventually use a divide and conquer algorithm to perform the multiplication"
(self digitLength >= 400) ifFalse: [^self * self].
(self digitLength >= 800) ifFalse: [^self squaredKaratsuba].
^self squaredToom4!
(self digitLength >= 1600) ifFalse: [^self squaredKaratsuba].
^self squaredToom3!
Item was added:
- ----- Method: LargePositiveInteger>>squaredToom4 (in category 'mathematical functions') -----
- squaredToom4
"Use a 4-way Toom-Cook divide and conquer algorithm to perform the multiplication.
See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
| p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
"divide in 4 parts"
p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
a3 := self butLowestNDigits: p * 3.
a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
a1 := self copyDigitsFrom: p + 1 to: p * 2.
a0 := self lowestNDigits: p.
"Toom-4 trick: 7 multiplications instead of 16"
a02 := a0 - a2.
a13 := a1 - a3.
s0 := a0 squared.
s1 := (a0 fastMultiply: a1) bitShift: 1.
s2 := (a02 + a13) fastMultiply: (a02 - a13).
s3 := ((a0 + a1) + (a2 + a3)) squared.
s4 := (a02 fastMultiply: a13) bitShift: 1.
s5 := (a3 fastMultiply: a2) bitShift: 1.
s6 := a3 squared.
"Interpolation"
t2 := s1 + s5.
t3 := (s2 + s3 + s4 bitShift: -1) - t2.
s3 := t2 - s4.
s4 := t3 - s0.
s2 := t3 - s2 - s6.
"Sum the parts of decomposition"
^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
+(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
- "
- | a |
- a := 770 factorial-1.
- a digitLength.
- [a * a - a squaredToom4 = 0] assert.
- [Smalltalk garbageCollect.
- [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
- [Smalltalk garbageCollect.
- [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
- "!
squeak-dev@lists.squeakfoundation.org