[squeak-dev] The Inbox: Kernel-nice.1218.mcz
Nicolas Cellier
nicolas.cellier.aka.nice at gmail.com
Sat Apr 27 10:05:19 UTC 2019
Err, I messed up with the quo/rem signs...
Tests pass, but there are not enough tests!
Le sam. 27 avr. 2019 à 10:41, <commits at source.squeak.org> a écrit :
> Nicolas Cellier uploaded a new version of Kernel to project The Inbox:
> http://source.squeak.org/inbox/Kernel-nice.1218.mcz
>
> ==================== Summary ====================
>
> Name: Kernel-nice.1218
> Author: nice
> Time: 27 April 2019, 10:41:24.794539 am
> UUID: 21f74fbe-a0cd-4b6f-86e9-be13465d57fe
> Ancestors: Kernel-nice.1217
>
> Implement the recursive fast division of Burnikel-Ziegler for large
> integers and connect it to digitDiv:neg: when operands are large enough.
>
> This is not the fastest known division which is a composition of Barrett
> and Newton-Raphson inversion - but is easy to implement and should have
> similar performances for at least a few thousand bytes long integers - see
> for example
> http://bioinfo.ict.ac.cn/~dbu/AlgorithmCourses/Lectures/Lec5-Fast-Division-Hasselstrom2003.pdf
>
> Use digitDiv:neg: in large integer printString so as to obtain the
> quotient (head) and remainder (tail) in a single operation. Together with
> divide and conquer division, this results in a factor of about 3x for 50000
> factorial printString.
>
> Implement the 4-way Toom-Cook squaring variant of Chung-Hasan. This
> over-performs the symetrical squaredToom3 even for medium size (800 bytes).
>
> =============== Diff against Kernel-nice.1217 ===============
>
> Item was added:
> + ----- Method: Integer>>digitDiv21: (in category 'private') -----
> + digitDiv21: anInteger
> +
> + ^(self digitDiv: anInteger neg: false) collect: #normalize!
>
> Item was added:
> + ----- Method: Integer>>digitDiv32: (in category 'private') -----
> + digitDiv32: anInteger
> +
> + ^(self digitDiv: anInteger neg: false) collect: #normalize!
>
> Item was changed:
> ----- Method: Integer>>digitDiv:neg: (in category 'private') -----
> + digitDiv: anInteger neg: aBoolean
> + ^self primDigitDiv: anInteger neg: aBoolean!
> - digitDiv: arg neg: ng
> - "Answer with an array of (quotient, remainder)."
> - | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t
> divDigitLength remDigitLength |
> - <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
> - arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
> - "TFEI added this line"
> - l := self digitLength - arg digitLength + 1.
> - l <= 0 ifTrue: [^ Array with: 0 with: self].
> - "shortcut against #highBit"
> - d := 8 - arg lastDigit highBitOfByte.
> - div := arg digitLshift: d.
> - divDigitLength := div digitLength + 1.
> - div := div growto: divDigitLength.
> - "shifts so high order word is >=128"
> - rem := self digitLshift: d.
> - rem digitLength = self digitLength ifTrue: [rem := rem growto:
> self digitLength + 1].
> - remDigitLength := rem digitLength.
> - "makes a copy and shifts"
> - quo := Integer new: l neg: ng.
> - dl := divDigitLength - 1.
> - "Last actual byte of data"
> - ql := l.
> - dh := div digitAt: dl.
> - dnh := dl = 1
> - ifTrue: [0]
> - ifFalse: [div digitAt: dl - 1].
> - 1 to: ql do:
> - [:k |
> - "maintain quo*arg+rem=self"
> - "Estimate rem/div by dividing the leading to bytes of rem
> by dh."
> - "The estimate is q = qhi*16+qlo, where qhi and qlo are
> nibbles."
> - j := remDigitLength + 1 - k.
> - "r1 := rem digitAt: j."
> - (rem digitAt: j)
> - = dh
> - ifTrue: [qhi := qlo := 15
> - "i.e. q=255"]
> - ifFalse:
> - ["Compute q = (r1,r2)//dh, t =
> (r1,r2)\\dh.
> - Note that r1,r2 are bytes, not nibbles.
> - Be careful not to generate intermediate
> results exceeding 13
> - bits."
> - "r2 := (rem digitAt: j - 1)."
> - t := ((rem digitAt: j)
> - bitShift: 4)
> - + ((rem digitAt: j
> - 1)
> -
> bitShift: -4).
> - qhi := t // dh.
> - t := (t \\ dh bitShift: 4)
> - + ((rem digitAt: j
> - 1)
> -
> bitAnd: 15).
> - qlo := t // dh.
> - t := t \\ dh.
> - "Next compute (hi,lo) := q*dnh"
> - hi := qhi * dnh.
> - lo := qlo * dnh + ((hi bitAnd: 15)
> - bitShift:
> 4).
> - hi := (hi bitShift: -4)
> - + (lo bitShift:
> -8).
> - lo := lo bitAnd: 255.
> - "Correct overestimate of q.
> - Max of 2 iterations through loop -- see
> Knuth vol. 2"
> - r3 := j < 3
> - ifTrue: [0]
> - ifFalse: [rem
> digitAt: j - 2].
> - [(t < hi
> - or: [t = hi and: [r3 < lo]])
> - and:
> - ["i.e. (t,r3) < (hi,lo)"
> - qlo := qlo - 1.
> - lo := lo - dnh.
> - lo < 0
> - ifTrue:
> - [hi := hi
> - 1.
> - lo := lo +
> 256].
> - hi >= dh]]
> - whileTrue: [hi := hi - dh].
> - qlo < 0
> - ifTrue:
> - [qhi := qhi - 1.
> - qlo := qlo + 16]].
> - "Subtract q*div from rem"
> - l := j - dl.
> - a := 0.
> - 1 to: divDigitLength do:
> - [:i |
> - hi := (div digitAt: i)
> - * qhi.
> - lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
> - bitShift: 4) -
> ((div digitAt: i)
> - * qlo).
> - rem digitAt: l put: lo - (lo // 256 * 256).
> - "sign-tolerant form of (lo bitAnd: 255)"
> - a := lo // 256 - (hi bitShift: -4).
> - l := l + 1].
> - a < 0
> - ifTrue:
> - ["Add div back into rem, decrease q by 1"
> - qlo := qlo - 1.
> - l := j - dl.
> - a := 0.
> - 1 to: divDigitLength do:
> - [:i |
> - a := (a bitShift: -8)
> - + (rem
> digitAt: l) + (div digitAt: i).
> - rem digitAt: l put: (a bitAnd:
> 255).
> - l := l + 1]].
> - quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
> - + qlo].
> - rem := rem
> - digitRshift: d
> - bytes: 0
> - lookfirst: dl.
> - ^ Array with: quo with: rem!
>
> Item was added:
> + ----- Method: Integer>>primDigitDiv:neg: (in category 'private') -----
> + primDigitDiv: arg neg: ng
> + "Answer with an array of (quotient, remainder)."
> + | quo rem ql d div dh dnh dl qhi qlo j l hi lo r3 a t
> divDigitLength remDigitLength |
> + <primitive: 'primDigitDivNegative' module:'LargeIntegers'>
> + arg = 0 ifTrue: [^ (ZeroDivide dividend: self) signal].
> + "TFEI added this line"
> + l := self digitLength - arg digitLength + 1.
> + l <= 0 ifTrue: [^ Array with: 0 with: self].
> + "shortcut against #highBit"
> + d := 8 - arg lastDigit highBitOfByte.
> + div := arg digitLshift: d.
> + divDigitLength := div digitLength + 1.
> + div := div growto: divDigitLength.
> + "shifts so high order word is >=128"
> + rem := self digitLshift: d.
> + rem digitLength = self digitLength ifTrue: [rem := rem growto:
> self digitLength + 1].
> + remDigitLength := rem digitLength.
> + "makes a copy and shifts"
> + quo := Integer new: l neg: ng.
> + dl := divDigitLength - 1.
> + "Last actual byte of data"
> + ql := l.
> + dh := div digitAt: dl.
> + dnh := dl = 1
> + ifTrue: [0]
> + ifFalse: [div digitAt: dl - 1].
> + 1 to: ql do:
> + [:k |
> + "maintain quo*arg+rem=self"
> + "Estimate rem/div by dividing the leading to bytes of rem
> by dh."
> + "The estimate is q = qhi*16+qlo, where qhi and qlo are
> nibbles."
> + j := remDigitLength + 1 - k.
> + "r1 := rem digitAt: j."
> + (rem digitAt: j)
> + = dh
> + ifTrue: [qhi := qlo := 15
> + "i.e. q=255"]
> + ifFalse:
> + ["Compute q = (r1,r2)//dh, t =
> (r1,r2)\\dh.
> + Note that r1,r2 are bytes, not nibbles.
> + Be careful not to generate intermediate
> results exceeding 13
> + bits."
> + "r2 := (rem digitAt: j - 1)."
> + t := ((rem digitAt: j)
> + bitShift: 4)
> + + ((rem digitAt: j
> - 1)
> +
> bitShift: -4).
> + qhi := t // dh.
> + t := (t \\ dh bitShift: 4)
> + + ((rem digitAt: j
> - 1)
> +
> bitAnd: 15).
> + qlo := t // dh.
> + t := t \\ dh.
> + "Next compute (hi,lo) := q*dnh"
> + hi := qhi * dnh.
> + lo := qlo * dnh + ((hi bitAnd: 15)
> + bitShift:
> 4).
> + hi := (hi bitShift: -4)
> + + (lo bitShift:
> -8).
> + lo := lo bitAnd: 255.
> + "Correct overestimate of q.
> + Max of 2 iterations through loop -- see
> Knuth vol. 2"
> + r3 := j < 3
> + ifTrue: [0]
> + ifFalse: [rem
> digitAt: j - 2].
> + [(t < hi
> + or: [t = hi and: [r3 < lo]])
> + and:
> + ["i.e. (t,r3) < (hi,lo)"
> + qlo := qlo - 1.
> + lo := lo - dnh.
> + lo < 0
> + ifTrue:
> + [hi := hi
> - 1.
> + lo := lo +
> 256].
> + hi >= dh]]
> + whileTrue: [hi := hi - dh].
> + qlo < 0
> + ifTrue:
> + [qhi := qhi - 1.
> + qlo := qlo + 16]].
> + "Subtract q*div from rem"
> + l := j - dl.
> + a := 0.
> + 1 to: divDigitLength do:
> + [:i |
> + hi := (div digitAt: i)
> + * qhi.
> + lo := a + (rem digitAt: l) - ((hi bitAnd: 15)
> + bitShift: 4) -
> ((div digitAt: i)
> + * qlo).
> + rem digitAt: l put: lo - (lo // 256 * 256).
> + "sign-tolerant form of (lo bitAnd: 255)"
> + a := lo // 256 - (hi bitShift: -4).
> + l := l + 1].
> + a < 0
> + ifTrue:
> + ["Add div back into rem, decrease q by 1"
> + qlo := qlo - 1.
> + l := j - dl.
> + a := 0.
> + 1 to: divDigitLength do:
> + [:i |
> + a := (a bitShift: -8)
> + + (rem
> digitAt: l) + (div digitAt: i).
> + rem digitAt: l put: (a bitAnd:
> 255).
> + l := l + 1]].
> + quo digitAt: ql + 1 - k put: (qhi bitShift: 4)
> + + qlo].
> + rem := rem
> + digitRshift: d
> + bytes: 0
> + lookfirst: dl.
> + ^ Array with: quo with: rem!
>
> Item was added:
> + ----- Method: LargePositiveInteger>>digitDiv21: (in category 'private')
> -----
> + digitDiv21: anInteger
> + "This is part of the recursive division algorithm from Burnikel -
> Ziegler
> + Divide a two limbs receiver by 1 limb dividend
> + Each limb is decomposed in two halves of p bytes (8*p bits)
> + so as to continue the recursion"
> +
> + | p qr1 qr2 |
> + p := anInteger digitLength + 1 bitShift: -1.
> + p <= 256 ifTrue: [^(self primDigitDiv: anInteger neg: false)
> collect: #normalize].
> + qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
> + qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p)
> digitDiv32: anInteger.
> + qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
> + ^qr2!
>
> Item was added:
> + ----- Method: LargePositiveInteger>>digitDiv32: (in category 'private')
> -----
> + digitDiv32: anInteger
> + "This is part of the recursive division algorithm from Burnikel -
> Ziegler
> + Divide 3 limb (a2,a1,a0) by 2 limb (b1,b0).
> + Each limb is made of p bytes (8*p bits).
> + This step transforms the division problem into multiplication
> + It must use the fastMultiply: to be worth the overhead costs."
> +
> + | a2 b1 d p q qr r |
> + p := anInteger digitLength + 1 bitShift: -1.
> + (a2 := self butLowestNDigits: 2*p)
> + < (b1 := anInteger butLowestNDigits: p)
> + ifTrue:
> + [qr := (self butLowestNDigits: p) digitDiv21: b1.
> + q := qr first.
> + r := qr last]
> + ifFalse:
> + [q := (1 bitShift: 8*p) - 1.
> + r := (self butLowestNDigits: p) - (b1 bitShift:
> 8*p) + b1].
> + d := q fastMultiply: (anInteger lowestNDigits: p).
> + r := (self lowestNDigits: p) + (r bitShift: 8*p) - d.
> + [r < 0]
> + whileTrue:
> + [q := q - 1.
> + r := r + anInteger].
> + ^Array with: q with: r
> + !
>
> Item was added:
> + ----- Method: LargePositiveInteger>>digitDiv:neg: (in category
> 'private') -----
> + digitDiv: anInteger neg: aBoolean
> + "If length is worth, engage a recursive divide and conquer
> strategy"
> + | qr |
> + (anInteger digitLength <= 256
> + or: [self digitLength <= anInteger digitLength])
> + ifTrue: [^ self primDigitDiv: anInteger neg: aBoolean].
> + qr := self abs recursiveDigitDiv: anInteger abs.
> + ^ aBoolean
> + ifTrue: [qr collect: #negated]
> + ifFalse: [qr]!
>
> Item was changed:
> ----- Method: LargePositiveInteger>>printOn:base: (in category
> 'printing') -----
> printOn: aStream base: b
> "Append a representation of this number in base b on aStream.
> In order to reduce cost of LargePositiveInteger ops, split the
> number in approximately two equal parts in number of digits."
>
> + | halfDigits halfPower head tail nDigitsUnderestimate qr |
> - | halfDigits halfPower head tail nDigitsUnderestimate |
> "Don't engage any arithmetic if not normalized"
> (self digitLength = 0 or: [(self digitAt: self digitLength) = 0])
> ifTrue: [^self normalize printOn: aStream base: b].
>
> nDigitsUnderestimate := b = 10
> ifTrue: [((self highBit - 1) * 1233 >> 12) + 1. "This is
> because (2 log)/(10 log)*4096 is slightly greater than 1233"]
> ifFalse: [self highBit quo: b highBit].
>
> "splitting digits with a whole power of two is more efficient"
> halfDigits := 1 bitShift: nDigitsUnderestimate highBit - 2.
>
> halfDigits <= 1
> ifTrue: ["Hmmm, this could happen only in case of a huge
> base b... Let lower level fail"
> ^self printOn: aStream base: b nDigits: (self
> numberOfDigitsInBase: b)].
>
> "Separate in two halves, head and tail"
> halfPower := b raisedToInteger: halfDigits.
> + qr := self digitDiv: halfPower neg: self negative.
> + head := qr first normalize.
> + tail := qr last normalize.
> - head := self quo: halfPower.
> - tail := self - (head * halfPower).
>
> "print head"
> head printOn: aStream base: b.
>
> "print tail without the overhead to count the digits"
> tail printOn: aStream base: b nDigits: halfDigits!
>
> Item was changed:
> ----- Method: LargePositiveInteger>>printOn:base:nDigits: (in category
> 'printing') -----
> printOn: aStream base: b nDigits: n
> "Append a representation of this number in base b on aStream using
> n digits.
> In order to reduce cost of LargePositiveInteger ops, split the
> number of digts approximatily in two
> Should be invoked with: 0 <= self < (b raisedToInteger: n)"
>
> + | halfPower half head tail qr |
> - | halfPower half head tail |
> n <= 1 ifTrue: [
> n <= 0 ifTrue: [self error: 'Number of digits n should be
> > 0'].
>
> "Note: this is to stop an infinite loop if one ever
> attempts to print with a huge base
> This can happen because choice was to not hardcode any
> limit for base b
> We let Character>>#digitValue: fail"
> ^aStream nextPut: (Character digitValue: self) ].
> halfPower := n bitShift: -1.
> half := b raisedToInteger: halfPower.
> + qr := self digitDiv: half neg: self negative.
> + head := qr first normalize.
> + tail := qr last normalize.
> - head := self quo: half.
> - tail := self - (head * half).
> head printOn: aStream base: b nDigits: n - halfPower.
> tail printOn: aStream base: b nDigits: halfPower!
>
> Item was added:
> + ----- Method: LargePositiveInteger>>recursiveDigitDiv: (in category
> 'private') -----
> + recursiveDigitDiv: anInteger
> + "This is the recursive division algorithm from Burnikel - Ziegler
> + See Fast Recursive Division - Christoph Burnikel, Joachim Ziegler
> + Research Report MPI-I-98-1-022, MPI Saarbrucken, Oct 1998
> +
> https://pure.mpg.de/rest/items/item_1819444_4/component/file_2599480/content
> "
> +
> + | s m t a b z qr q i |
> + "round digits up to next power of 2"
> + s := anInteger digitLength.
> + m := 1 bitShift: (s - 1) highBit.
> + "shift so that leading bit of leading byte be 1, and digitLength
> power of two"
> + s := m * 8 - anInteger highBit.
> + a := self bitShift: s.
> + b := anInteger bitShift: s.
> +
> + "Decompose a into t limbs - each limb have m bytes
> + choose t such that leading bit of leading limb of a be 0"
> + t := (a highBit + 1 / (m * 8)) ceiling.
> + z := a butLowestNDigits: t - 2 * m.
> + i := t - 2.
> + q := 0.
> + "and do a division of two limb by 1 limb b for each pair of limb
> of a"
> + [qr := z digitDiv21: b.
> + q := (q bitShift: 8*m) + qr first. "Note: this naive
> recomposition of q cost O(t^2) - it is possible in O(t log(t))"
> + (i := i - 1) >= 0] whileTrue:
> + [z := (qr last bitShift: 8*m) + (a copyDigitsFrom: i * m +
> 1 to: i + 1 * m)].
> + ^Array with: q with: (qr last bitShift: s negated)!
>
> Item was changed:
> ----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical
> functions') -----
> sqrtRem
> "Like super, but use a divide and conquer method to perform this
> operation.
> See Paul Zimmermann. Karatsuba Square Root. [Research Report]
> RR-3805, INRIA. 1999, pp.8. <inria-00072854>
> https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
>
> + | n qr q s r sr high mid low |
> - | n qr s r sr high mid low |
> n := self digitLength bitShift: -2.
> n >= 16 ifFalse: [^super sqrtRem].
> high := self butLowestNDigits: n * 2.
> mid := self copyDigitsFrom: n + 1 to: n * 2.
> low := self lowestNDigits: n.
>
> sr := high sqrtRem.
> qr := (sr last bitShift: 8 * n) + mid digitDiv: (sr first
> bitShift: 1) neg: false.
> + q := qr first normalize.
> + s := (sr first bitShift: 8 * n) + q.
> + r := (qr last normalize bitShift: 8 * n) + low - q squared.
> - s := (sr first bitShift: 8 * n) + qr first.
> - r := (qr last bitShift: 8 * n) + low - qr first squared.
> r negative
> ifTrue:
> [r := (s bitShift: 1) + r - 1.
> s := s - 1].
> sr at: 1 put: s; at: 2 put: r.
> ^sr
> !
>
> Item was changed:
> ----- Method: LargePositiveInteger>>squared (in category 'mathematical
> functions') -----
> squared
> "Eventually use a divide and conquer algorithm to perform the
> multiplication"
>
> (self digitLength >= 400) ifFalse: [^self * self].
> + (self digitLength >= 800) ifFalse: [^self squaredKaratsuba].
> + ^self squaredToom4!
> - (self digitLength >= 1600) ifFalse: [^self squaredKaratsuba].
> - ^self squaredToom3!
>
> Item was added:
> + ----- Method: LargePositiveInteger>>squaredToom4 (in category
> 'mathematical functions') -----
> + squaredToom4
> + "Use a 4-way Toom-Cook divide and conquer algorithm to perform the
> multiplication.
> + See Asymmetric Squaring Formulae Jaewook Chung and M. Anwar Hasan
> + https://www.lirmm.fr/arith18/papers/Chung-Squaring.pdf"
> +
> + | p a0 a1 a2 a3 a02 a13 s0 s1 s2 s3 s4 s5 s6 t2 t3 |
> + "divide in 4 parts"
> + p := (self digitLength + 3 bitShift: -2) bitClear: 2r11.
> + a3 := self butLowestNDigits: p * 3.
> + a2 := self copyDigitsFrom: p * 2 + 1 to: p * 3.
> + a1 := self copyDigitsFrom: p + 1 to: p * 2.
> + a0 := self lowestNDigits: p.
> +
> + "Toom-4 trick: 7 multiplications instead of 16"
> + a02 := a0 - a2.
> + a13 := a1 - a3.
> + s0 := a0 squared.
> + s1 := (a0 fastMultiply: a1) bitShift: 1.
> + s2 := (a02 + a13) fastMultiply: (a02 - a13).
> + s3 := ((a0 + a1) + (a2 + a3)) squared.
> + s4 := (a02 fastMultiply: a13) bitShift: 1.
> + s5 := (a3 fastMultiply: a2) bitShift: 1.
> + s6 := a3 squared.
> +
> + "Interpolation"
> + t2 := s1 + s5.
> + t3 := (s2 + s3 + s4 bitShift: -1) - t2.
> + s3 := t2 - s4.
> + s4 := t3 - s0.
> + s2 := t3 - s2 - s6.
> +
> + "Sum the parts of decomposition"
> + ^s0 + (s1 bitShift: 8*p) + (s2 + (s3 bitShift: 8*p) bitShift: 16*p)
> + +(s4 + (s5 bitShift: 8*p) + (s6 bitShift: 16*p) bitShift: 32*p)
> +
> + "
> + | a |
> + a := 770 factorial-1.
> + a digitLength.
> + [a * a - a squaredToom4 = 0] assert.
> + [Smalltalk garbageCollect.
> + [1000 timesRepeat: [a squaredToom4]] timeToRun] value /
> + [Smalltalk garbageCollect.
> + [1000 timesRepeat: [a squaredKaratsuba]] timeToRun] value asFloat
> + "!
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20190427/81d358da/attachment-0001.html>
More information about the Squeak-dev
mailing list
|