Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
http://source.squeak.org/trunk/Kernel-nice.1234.mcz
==================== Summary ====================
Name: Kernel-nice.1234
Author: nice
Time: 14 May 2019, 12:40:59.140364 am
UUID: 3c271006-81c5-49e7-8131-b65c154b009f
Ancestors: Kernel-nice.1233
Introduce settable parameters for tuning the Large Integer arithmetic thresholds.
Use these parameters in the accelerated arithmetic.
For Burnikel Ziegler, ensure at least two recursions by using twice the threshold of digitDiv21: before engaging a digitDivSplit: - this somehow mitigates the overhead cost of digitDivSplit:
=============== Diff against Kernel-nice.1233 ===============
Item was changed:
Integer variableByteSubclass: #LargePositiveInteger
instanceVariableNames: ''
+ classVariableNames: 'ThresholdForDiv21 ThresholdForMul22 ThresholdForMul33 ThresholdForSqrt ThresholdForSquaredByFourth ThresholdForSquaredByHalf'
- classVariableNames: ''
poolDictionaries: ''
category: 'Kernel-Numbers'!
!LargePositiveInteger commentStamp: '<historical>' prior: 0!
I represent positive integers of more than 30 bits (ie, >= 1073741824). These values are beyond the range of SmallInteger, and are encoded here as an array of 8-bit digits. Care must be taken, when new values are computed, that any result that COULD BE a SmallInteger IS a SmallInteger (see normalize).
Note that the bit manipulation primitives, bitAnd:, bitShift:, etc., = and ~= run without failure (and therefore fast) if the value fits in 32 bits. This is a great help to the simulator.!
Item was added:
+ ----- Method: LargePositiveInteger class>>thresholdForDiv21 (in category 'accessing') -----
+ thresholdForDiv21
+ <preference: 'Threshold for division by Burnikel-Ziegler recursive split'
+ category: 'Arithmetic'
+ description: 'The number of byte-digit above which recursive division is more efficient than schoolbook division'
+ type: #Number>
+ ^ThresholdForDiv21 ifNil: [256]!
Item was added:
+ ----- Method: LargePositiveInteger class>>thresholdForDiv21: (in category 'accessing') -----
+ thresholdForDiv21: anIntegerOrNil
+ "the number of byte-digit above which schoolbook division is more efficient than recursive division"
+ ThresholdForDiv21 := anIntegerOrNil ifNotNil: [:t | t max: Smalltalk wordSize]!
Item was added:
+ ----- Method: LargePositiveInteger class>>thresholdForMul22 (in category 'accessing') -----
+ thresholdForMul22
+ <preference: 'Threshold for multiplication by 2-way Karatsuba'
+ category: 'Arithmetic'
+ description: 'The number of byte-digit above which Karatsuba is more efficient than schoolbook'
+ type: #Number>
+ ^ThresholdForMul22 ifNil: [600]!
Item was added:
+ ----- Method: LargePositiveInteger class>>thresholdForMul22: (in category 'accessing') -----
+ thresholdForMul22: anIntegerOrNil
+ "the number of byte-digit above which Karatsuba is more efficient than schoolbook"
+ ThresholdForMul22 := anIntegerOrNil ifNotNil: [:t | t max: Smalltalk wordSize + 1]!
Item was added:
+ ----- Method: LargePositiveInteger class>>thresholdForMul33 (in category 'accessing') -----
+ thresholdForMul33
+ <preference: 'Threshold for multiplication by 3-way Toom-Cook'
+ category: 'Arithmetic'
+ description: 'The number of byte-digit above which 3-way Toom-Cook is more efficient than Karatsuba'
+ type: #Number>
+ ^ThresholdForMul33 ifNil: [2000]!
Item was added:
+ ----- Method: LargePositiveInteger class>>thresholdForMul33: (in category 'accessing') -----
+ thresholdForMul33: anIntegerOrNil
+ "the number of byte-digit above which 3-way Toom-Cook is more efficient than Karatsuba"
+ ThresholdForMul33 := anIntegerOrNil ifNotNil: [:t | t max: Smalltalk wordSize + 1]!
Item was added:
+ ----- Method: LargePositiveInteger class>>thresholdForSquaredByFourth (in category 'accessing') -----
+ thresholdForSquaredByFourth
+ <preference: 'Threshold for squaring by 4-way Toom-Cook'
+ category: 'Arithmetic'
+ description: 'The number of byte-digit above which 4-way Toom-Cook squaring is more efficient than Karatsuba'
+ type: #Number>
+ ^ThresholdForSquaredByFourth ifNil: [800]!
Item was added:
+ ----- Method: LargePositiveInteger class>>thresholdForSquaredByFourth: (in category 'accessing') -----
+ thresholdForSquaredByFourth: anIntegerOrNil
+ "the number of byte-digit above which 4-way Toom-Cook squaring is more efficient than Karatsuba"
+ ^ThresholdForSquaredByFourth := anIntegerOrNil ifNotNil: [:t | t max: Smalltalk wordSize]!
Item was added:
+ ----- Method: LargePositiveInteger class>>thresholdForSquaredByHalf (in category 'accessing') -----
+ thresholdForSquaredByHalf
+ <preference: 'Threshold for squaring by 2-way Karatsuba'
+ category: 'Arithmetic'
+ description: 'The number of byte-digit above which Karatsuba squaring is more efficient than schoolbook multiplication'
+ type: #Number>
+ ^ThresholdForSquaredByHalf ifNil: [400]!
Item was added:
+ ----- Method: LargePositiveInteger class>>thresholdForSquaredByHalf: (in category 'accessing') -----
+ thresholdForSquaredByHalf: anIntegerOrNil
+ "the number of byte-digit above which Karatsuba squaring is more efficient than schoolbook multiplication"
+ ^ThresholdForSquaredByHalf := anIntegerOrNil ifNotNil: [:t | t max: Smalltalk wordSize]!
Item was changed:
----- Method: LargePositiveInteger>>digitDiv21: (in category 'private') -----
digitDiv21: anInteger
"This is part of the recursive division algorithm from Burnikel - Ziegler
Divide a two limbs receiver by 1 limb dividend
Each limb is decomposed in two halves of p bytes (8*p bits)
so as to continue the recursion"
| p qr1 qr2 |
p := (anInteger digitLength + 1 bitShift: -1) bitClear: 2r11.
+ p < self class thresholdForDiv21 ifTrue: [^(self digitDiv: anInteger neg: false) collect: #normalize].
- p <= 256 ifTrue: [^(self digitDiv: anInteger neg: false) collect: #normalize].
qr1 := (self butLowestNDigits: p) digitDiv32: anInteger.
qr2 := (self lowestNDigits: p) + (qr1 last bitShift: 8*p) digitDiv32: anInteger.
qr2 at: 1 put: (qr2 at: 1) + ((qr1 at: 1) bitShift: 8*p).
^qr2!
Item was changed:
----- Method: LargePositiveInteger>>divideByInteger: (in category 'private') -----
divideByInteger: anInteger
"If length is worth, engage a recursive divide and conquer strategy,
more efficient than super"
| qr |
+ (anInteger digitLength < (self class thresholdForDiv21 * 2)
- (anInteger digitLength <= 256
or: [self digitLength <= anInteger digitLength])
ifTrue: [^ self digitDiv: anInteger neg: self negative ~~ anInteger negative].
qr := self abs digitDivSplit: anInteger abs.
self negative ifTrue: [qr at: 2 put: qr second negated].
self negative = anInteger negative ifFalse: [qr at: 1 put: qr first negated].
^qr!
Item was changed:
----- Method: LargePositiveInteger>>multiplyByInteger: (in category 'private') -----
multiplyByInteger: anInteger
"Return the result of multiplying the receiver by the Integer argument.
This method dispatch to the fastest algorithm based on operands length."
| xLen yLen |
"Revert to schoolbook multiplication if short"
+ (xLen := self digitLength) < self class thresholdForMul22
- (xLen := self digitLength) < 600
ifTrue: [^ self digitMultiply: anInteger
neg: self negative ~~ anInteger negative].
"Arrange to have the receiver be the shorter and retry"
xLen > (yLen := anInteger digitLength)
ifTrue: [^ anInteger multiplyByInteger: self ].
"Seek for integers of about the same length, else split the long one"
yLen > (2 * xLen) ifTrue: [^self digitMulSplit: anInteger].
"Choose the fastest divide and conquer algorithm based on another heuristic"
+ xLen >= self class thresholdForMul33 ifTrue: [^self digitMul33: anInteger].
- xLen >= 2000 ifTrue: [^self digitMul33: anInteger].
yLen * 2 >= (3 * xLen) ifTrue: [^self digitMul23: anInteger].
^self digitMul22: anInteger!
Item was changed:
----- Method: LargePositiveInteger>>squared (in category 'mathematical functions') -----
squared
"Eventually use a divide and conquer algorithm to perform the multiplication"
+ (self digitLength >= self class thresholdForSquaredByHalf) ifFalse: [^self * self].
+ (self digitLength >= self class thresholdForSquaredByFourth) ifFalse: [^self squaredByHalf].
- (self digitLength >= 400) ifFalse: [^self * self].
- (self digitLength >= 800) ifFalse: [^self squaredByHalf].
^self squaredByFourth!
Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
http://source.squeak.org/trunk/Kernel-nice.1233.mcz
==================== Summary ====================
Name: Kernel-nice.1233
Author: nice
Time: 13 May 2019, 6:32:30.507955 pm
UUID: 8ba1a42b-88c8-1641-8a1a-bea810163ec2
Ancestors: Kernel-nice.1232
Oups fix creeping of sqrt2, it was only for bench purpose.
Sorry for multiple commits.
I rely too much on (alt shift i) to ignore methods, but sometimes I miss one.
It would be cool to have a way to hide the ignored methods before committing.
=============== Diff against Kernel-nice.1232 ===============
Item was removed:
- ----- Method: LargePositiveInteger>>sqrt2 (in category 'mathematical functions') -----
- sqrt2
- "Answer the square root of the receiver.
- If the square root is exact, answer an Integer, else answer a Float approximation.
- Handle some tricky case of Float overflow or inaccuracy"
-
- | selfAsFloat floatResult integerResult |
- selfAsFloat := self asFloat.
- floatResult := selfAsFloat sqrt.
- floatResult isFinite
- ifTrue:
- [self mightBeASquare ifFalse: ["we know for sure no exact solution exists,
- just answer the cheap float approximation without wasting time."
- ^floatResult].
- "Answer integerResult in case of perfect square"
- integerResult := floatResult truncated.
- integerResult squared = self ifTrue: [^integerResult]].
-
- "In this case, maybe it failed because we are such a big integer that the Float method becomes
- inexact, even if we are a whole square number. So, try the slower but more general method"
- selfAsFloat >= Float maxExactInteger asFloat squared
- ifTrue: [
- integerResult := self sqrtFloor.
- integerResult squared = self ifTrue: [
- ^integerResult ].
-
- "Nothing else can be done. No exact answer means answer must be a Float.
- Answer the best we have which is the rounded sqrt."
- integerResult := (self bitShift: 2) sqrtFloor.
- ^((integerResult bitShift: -1) + (self bitAnd: 1)) asFloat].
-
- "We need an approximate result"
- ^floatResult!
Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
http://source.squeak.org/trunk/Kernel-nice.1232.mcz
==================== Summary ====================
Name: Kernel-nice.1232
Author: nice
Time: 13 May 2019, 5:43:37.253569 pm
UUID: 880030b9-91d0-244b-9198-b7b6f9454584
Ancestors: Kernel-nice.1231
Make sure that LargePositiveInteger sqrt are always correctly rounded to nearest Float if inexact.
Note that I accidentally introduced a rounding bug in LargePositiveInteger sqrt which was not in original Integer>>#sqrt:
(self bitAnd: 1) should have been (integerResult bitAnd: 1).
Apologizes.
But this formulation was subject to double-rounding error anyway (asFloat did perform a second rounding, which may lead to incorrect rounding), so that did not really matter...
Concerning performance, here are the bench results, after correcting the original version (I send twice self asFloat, which was a slip, and the original did not test mightBeASquare either when self asFloat = Float infinity)
- When (self mighBeASquare), new code is equivalent, and becomes more performant for huge values
This is due to sqrtRem which avoids evaluating a #squared for no additional price
Note that 7 large ints out of 256 mightBeASquare
- else it depends on bit range, new code is :
* as performant in range (SmallInteger maxVal highBit + 1 to: Float precision * 2)
that's the case of vast majority of large int in a "normal" image
* less performant in range (Float precision * 2 + 1 to: Float emax) by 3x to 5x
that's because asFloat sqrt is cheaper than sqrtRem
* more performant in range (Float emax + 1 to: infinity) by large
this is due to evaluating sqrtFloor on a much smaller LargePositiveInteger
(but this trick could have been added in inexact-rounding variant too)
So the exact rounding:
- does not require additional code (both are equivalent)
- does not cost any performance penalty in most common cases (small LargePositiveInteger); the penalty is only in medium range (107 to 1023 bits).
In a word, it's worth.
=============== Diff against Kernel-nice.1231 ===============
Item was changed:
----- Method: LargePositiveInteger>>sqrt (in category 'mathematical functions') -----
sqrt
"Answer the square root of the receiver.
If the square root is exact, answer an Integer, else answer a Float approximation.
+ Make sure the result is correctly rounded (i.e. the nearest Float to the exact square root)"
- Handle some tricky case of Float overflow or inaccuracy"
+ | floatResult integerResult guardBit highBit sr |
+ (highBit := self highBit) < (Float precision * 2)
- | selfAsFloat floatResult integerResult |
- selfAsFloat := self asFloat.
- floatResult := self asFloat sqrt.
- floatResult isFinite
ifTrue:
+ ["the sqrt of self asFloat is correctly rounded, so use it"
+ floatResult := self asFloat sqrt.
+ self mightBeASquare ifFalse: [^floatResult].
- [self mightBeASquare ifFalse: ["we know for sure no exact solution exists,
- just answer the cheap float approximation without wasting time."
- ^floatResult].
"Answer integerResult in case of perfect square"
integerResult := floatResult truncated.
+ integerResult squared = self ifTrue: [^integerResult].
+ ^floatResult].
- integerResult squared = self ifTrue: [^integerResult]].
-
- "In this case, maybe it failed because we are such a big integer that the Float method becomes
- inexact, even if we are a whole square number. So, try the slower but more general method"
- selfAsFloat >= Float maxExactInteger asFloat squared
- ifTrue: [
- integerResult := self sqrtFloor.
- integerResult squared = self ifTrue: [
- ^integerResult ].
-
- "Nothing else can be done. No exact answer means answer must be a Float.
- Answer the best we have which is the rounded sqrt."
- integerResult := (self bitShift: 2) sqrtFloor.
- ^((integerResult bitShift: -1) + (self bitAnd: 1)) asFloat].
+ "Eventually use a guard bit for handling correct rounding direction"
+ guardBit := highBit <= (Float precision + 1 * 2)
+ ifTrue:
+ ["Add one guard bit for rounding correctly"
+ 1]
+ ifFalse:
+ [self mightBeASquare
+ ifTrue:
+ ["Keep all the bits in case we are a perfect square"
+ 0]
+ ifFalse:
+ ["Remove superfluous bit that won't change the Float approximation"
+ Float precision + 1 - (highBit // 2)]].
+
+ "Get truncated sqrt and remainder for the same price"
+ sr := (self bitShift: guardBit * 2) sqrtRem.
+
+ "Handle case of perfect square"
+ integerResult := sr first.
+ sr last isZero ifTrue: [^integerResult bitShift: guardBit negated].
+
+ "Answer the best we have which is the sqrt correctly rounded at Float precision."
+ ^((integerResult bitShift: Float precision - integerResult highBit)
+ + (integerResult bitAt: integerResult highBit - Float precision)) asFloat
+ timesTwoPower: integerResult highBit - Float precision - guardBit!
- "We need an approximate result"
- ^floatResult!
Item was added:
+ ----- Method: LargePositiveInteger>>sqrt2 (in category 'mathematical functions') -----
+ sqrt2
+ "Answer the square root of the receiver.
+ If the square root is exact, answer an Integer, else answer a Float approximation.
+ Handle some tricky case of Float overflow or inaccuracy"
+
+ | selfAsFloat floatResult integerResult |
+ selfAsFloat := self asFloat.
+ floatResult := selfAsFloat sqrt.
+ floatResult isFinite
+ ifTrue:
+ [self mightBeASquare ifFalse: ["we know for sure no exact solution exists,
+ just answer the cheap float approximation without wasting time."
+ ^floatResult].
+ "Answer integerResult in case of perfect square"
+ integerResult := floatResult truncated.
+ integerResult squared = self ifTrue: [^integerResult]].
+
+ "In this case, maybe it failed because we are such a big integer that the Float method becomes
+ inexact, even if we are a whole square number. So, try the slower but more general method"
+ selfAsFloat >= Float maxExactInteger asFloat squared
+ ifTrue: [
+ integerResult := self sqrtFloor.
+ integerResult squared = self ifTrue: [
+ ^integerResult ].
+
+ "Nothing else can be done. No exact answer means answer must be a Float.
+ Answer the best we have which is the rounded sqrt."
+ integerResult := (self bitShift: 2) sqrtFloor.
+ ^((integerResult bitShift: -1) + (self bitAnd: 1)) asFloat].
+
+ "We need an approximate result"
+ ^floatResult!
Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
http://source.squeak.org/trunk/Kernel-nice.1231.mcz
==================== Summary ====================
Name: Kernel-nice.1231
Author: nice
Time: 12 May 2019, 10:58:37.936831 pm
UUID: fdae6969-331b-4fa3-83b9-4cadb4d84290
Ancestors: Kernel-nice.1230
Oups, I forgot one germane Float sqrt edge case - sorry to double commit.
-0.0 sqrt should answer -0.0 (don't ask why, read the standard...)
I already corrected original code for inf,nan,gradual underflow (very inaccurate), and incorrectly rounded for a few ETA-floats (ETA is 10^18).
I have maybe 20 copies of this fallback code in my change log, and I still missed one case!
Note entirely my fault, that's one of the 4 original lines of code that I did not change...
=============== Diff against Kernel-nice.1230 ===============
Item was changed:
----- Method: Float>>sqrt (in category 'mathematical functions') -----
sqrt
"Fallback code for absent primitives.
Care to answer a correctly rounded result as mandated by IEEE-754."
| guess selfScaled nextGuess exp secator hi lo remainder maxError |
self <= 0.0
ifTrue: [self = 0.0
+ ifTrue: [^ self]
- ifTrue: [^ 0.0]
ifFalse: [^ DomainError signal: 'sqrt undefined for number less than zero.']].
self isFinite ifFalse: [^self].
"scale to avoid loss of precision in case of gradual underflow
(selfScaled between: 1.0 and: 2.0), so it is a good guess by itself"
exp := self exponent // 2.
guess := selfScaled := self timesTwoPower: exp * -2.
"Use Newton-Raphson iteration - it converges quadratically
(twice many correct bits at each loop)"
[nextGuess := selfScaled - (guess * guess) / (2.0 * guess) + guess.
"test if both guess are within 1 ulp"
(nextGuess + guess) / 2.0 = guess]
whileFalse:
["always round odd upper - this avoids infinite loop with alternate flip of last bit"
guess := nextGuess + (nextGuess ulp/2.0)].
"adjust the rounding - the guess can be 2 ulp up or 1 ulp down
Let u = guess ulp.
if (guess+u/2)^2<self, then guess is under-estimated
if (guess-u/2)^2>self, then guess is over-estimated
Note that they can't be equal (because left term has 55 bits).
(guess+u/2)^2=guess^2 + guess*u + u^2/4 < self
==> self - guess^2 > guess*u
(guess-u/2)^2=guess^2 - guess*u + u^2/4 > self
==> guess^2 - self >= guess*u
(guess^2 - self) is evaluated with an emulated fused-multiply-add"
["Decompose guess in two 26 bits parts hi,lo
the trick is that they do not necessarily have the same sign
If 53 bits are hi,0,lo => (hi,lo) else hi,1,lo=> (hi+1,-lo)"
secator := "1<<27+1" 134217729.0.
hi := guess * secator.
hi :=hi + (guess - hi).
lo := guess - hi.
"The operations below are all exact"
remainder := selfScaled - hi squared - (hi * lo * 2.0) - lo squared.
maxError := guess timesTwoPower: 1 - Float precision.
remainder > maxError or: [remainder negated >= maxError]]
whileTrue: [guess :=remainder > 0.0
ifTrue: [guess successor]
ifFalse: [guess predecessor]].
"undo the scaling"
^ guess timesTwoPower: exp!
Nicolas Cellier uploaded a new version of KernelTests to project The Trunk:
http://source.squeak.org/trunk/KernelTests-nice.363.mcz
==================== Summary ====================
Name: KernelTests-nice.363
Author: nice
Time: 12 May 2019, 10:47:16.307052 pm
UUID: 6b78d1d7-57ac-4173-ad99-442804a134cf
Ancestors: KernelTests-nice.362
Test the fallback code for Float sqrt (used when primitives 55 or 555 are absent)
=============== Diff against KernelTests-nice.362 ===============
Item was changed:
----- Method: FloatTest>>testIsZero (in category 'tests - zero behavior') -----
testIsZero
self assert: 0.0 isZero.
+ self assert: Float negativeZero isZero.
self deny: 0.1 isZero.!
Item was added:
+ ----- Method: FloatTest>>testSqrtFallback (in category 'tests - mathematical functions') -----
+ testSqrtFallback
+ | fallBackMethod |
+ fallBackMethod := Float>>#sqrt.
+ {Float fmin. Float fmin * 2.0. Float fmin * 63.0. Float fmax. Float fmax predecessor predecessor.
+ 1.0. 2.0. 3.0. 4.0. 5.0}
+ do: [:f |
+ | s sm sp |
+ "check against the primitives - if they are absent, it does not test anything..."
+ s := fallBackMethod valueWithReceiver: f arguments: Array new.
+ self assert: s equals: f sqrt.
+
+ "in case we don't have the primitive, use exact arithmetic and a bit of logic"
+ sm := s asTrueFraction - (s ulp asTrueFraction / 2).
+ sp := s asTrueFraction + (s ulp asTrueFraction / 2).
+
+ self assert: s asTrueFraction squared < f ==> [sp squared > f]
+ description: '(s)^2 < (s+ulp/2)^2 <= f => s is more than ulp/2 away from the true square root of f'.
+ self assert: s asTrueFraction squared > f ==> [sm squared < f]
+ description: 'f <= (s-ulp/2)^2 < (s)^2 ==> s is more than ulp/2 away from the true square root of f'].
+ self assertIsNegativeZero: (fallBackMethod valueWithReceiver: Float negativeZero arguments: Array new).
+ self assertIsPositiveZero: (fallBackMethod valueWithReceiver: 0.0 arguments: Array new).
+ self assert: (fallBackMethod valueWithReceiver: Float nan arguments: Array new) isNaN.
+ self assert: (fallBackMethod valueWithReceiver: Float infinity arguments: Array new) equals: Float infinity.
+ self should: [fallBackMethod valueWithReceiver: -2.0 arguments: Array new] raise: DomainError!
Nicolas Cellier uploaded a new version of Kernel to project The Trunk:
http://source.squeak.org/trunk/Kernel-nice.1230.mcz
==================== Summary ====================
Name: Kernel-nice.1230
Author: nice
Time: 12 May 2019, 9:52:59.138604 pm
UUID: c5ccb933-92e3-4969-9bb4-05b6794c16bf
Ancestors: Kernel-nice.1229
Name: Kernel-nice.1230
Author: nice
Time: 12 May 2019, 2:13:29.878804 am
UUID: 56e8a3d6-3899-47b1-b828-cf0525f7dec3
Ancestors: Kernel-nice.1229
Revamp sqrt
1) move falback code for missing primitive 55 or 555 in Float and invoke super if primitive fails
2) handle inf and Nan in this fallback code
3) scale to avoid dramatic loss of precision of original Newton-Raphson loop in case of gradual underflow
4) round the result correctly as mandated by IEEE-754 standard.
Since Newton-Raphson converges quadratically, it's at most one more iteration than before, plus a somehow germane adjustment...
Don't be spare of comments - I re-invented the wheel and this is documented nowhere else. There are many other ways to evaluate a correctly rounded sqrt, but they are all tricky.
Another way would be to rely on LargePositiveInteger sqrtFloor, but it won't be possible anymore, because SmallInteger sqrtFloor will now depend onFloat sqrt, and LargePositiveInteger depends on SmallInteger sqrtFloor (by virtue of divide and conquer) - see below.
5) move sqrt, sqrtFloor and sqrtRem implementation in Integer subclasses, so that each one takes its own responsibilities which are not exactly the same...
6) Accelerate SmallInteger sqrtFloor by invoking Float sqrt (the case of absent primitives apart)
Provide long explanation telling how it works in 64bits image despite inexact asFloat
7) Accelerate SmallInteger sqrt by not testing the isFinite etc...
Provide long explanation why sqrt truncated works in 64bits image despite inexact asFloat
Note that 6) and 7) are dead simple versus former super version, only the comment is not...
8) Accelerate LargePositiveInteger sqrtRem and sqrtFloor by removing the harcoded threshold: divide & conquer was faster than Newton-Raphson unconditionnally and asFloat truncated is not a viable option for Large
9) remove sqrtFloorNewtonRaphson now that neither Small nor Large integers use it. It's a heartbreaking change, I had super-optimized it along the years...
Note: I could have kept a Newton-Raphson for large, accelerated by seeding the first guess with asFloat sqrt truncated.
Up to 106 bits, the single correction of SmallInteger>>sqrtFloor might also work.
10) move the handling of Float overlow/accuracy in LargePositiveInteger sqrt where they belong. It's not about speed-up, it's about putting things at their place.
11) Speed up SmallInteger isAnExactFloat.
It's not sqrt related, but I first thought using it before engaging asFloat sqrt truncated, that's the reason why you find it here
Congratulations, you reached the end of this commit message!
=============== Diff against Kernel-nice.1229 ===============
Item was changed:
----- Method: BoxedFloat64>>sqrt (in category 'mathematical functions') -----
sqrt
"Answer the square root of the receiver.
Optional. See Object documentation whatIsAPrimitive."
- | exp guess eps delta |
<primitive: 55>
+ ^super sqrt!
- #Numeric.
- "Changed 200/01/19 For ANSI <number> support."
- "Newton-Raphson"
- self <= 0.0
- ifTrue: [self = 0.0
- ifTrue: [^ 0.0]
- ifFalse: ["v Chg"
- ^ DomainError signal: 'sqrt undefined for number less than zero.']].
- "first guess is half the exponent"
- exp := self exponent // 2.
- guess := self timesTwoPower: 0 - exp.
- "get eps value"
- eps := guess * Epsilon.
- eps := eps * eps.
- delta := self - (guess * guess) / (guess * 2.0).
- [delta * delta > eps]
- whileTrue:
- [guess := guess + delta.
- delta := self - (guess * guess) / (guess * 2.0)].
- ^ guess!
Item was added:
+ ----- Method: Float>>sqrt (in category 'mathematical functions') -----
+ sqrt
+ "Fallback code for absent primitives.
+ Care to answer a correctly rounded result as mandated by IEEE-754."
+
+ | guess selfScaled nextGuess exp secator hi lo remainder maxError |
+ self <= 0.0
+ ifTrue: [self = 0.0
+ ifTrue: [^ 0.0]
+ ifFalse: [^ DomainError signal: 'sqrt undefined for number less than zero.']].
+ self isFinite ifFalse: [^self].
+
+ "scale to avoid loss of precision in case of gradual underflow
+ (selfScaled between: 1.0 and: 2.0), so it is a good guess by itself"
+ exp := self exponent // 2.
+ guess := selfScaled := self timesTwoPower: exp * -2.
+
+ "Use Newton-Raphson iteration - it converges quadratically
+ (twice many correct bits at each loop)"
+ [nextGuess := selfScaled - (guess * guess) / (2.0 * guess) + guess.
+ "test if both guess are within 1 ulp"
+ (nextGuess + guess) / 2.0 = guess]
+ whileFalse:
+ ["always round odd upper - this avoids infinite loop with alternate flip of last bit"
+ guess := nextGuess + (nextGuess ulp/2.0)].
+
+ "adjust the rounding - the guess can be 2 ulp up or 1 ulp down
+ Let u = guess ulp.
+ if (guess+u/2)^2<self, then guess is under-estimated
+ if (guess-u/2)^2>self, then guess is over-estimated
+ Note that they can't be equal (because left term has 55 bits).
+ (guess+u/2)^2=guess^2 + guess*u + u^2/4 < self
+ ==> self - guess^2 > guess*u
+ (guess-u/2)^2=guess^2 - guess*u + u^2/4 > self
+ ==> guess^2 - self >= guess*u
+ (guess^2 - self) is evaluated with an emulated fused-multiply-add"
+
+ ["Decompose guess in two 26 bits parts hi,lo
+ the trick is that they do not necessarily have the same sign
+ If 53 bits are hi,0,lo => (hi,lo) else hi,1,lo=> (hi+1,-lo)"
+ secator := "1<<27+1" 134217729.0.
+ hi := guess * secator.
+ hi :=hi + (guess - hi).
+ lo := guess - hi.
+
+ "The operations below are all exact"
+ remainder := selfScaled - hi squared - (hi * lo * 2.0) - lo squared.
+ maxError := guess timesTwoPower: 1 - Float precision.
+ remainder > maxError or: [remainder negated >= maxError]]
+ whileTrue: [guess :=remainder > 0.0
+ ifTrue: [guess successor]
+ ifFalse: [guess predecessor]].
+
+ "undo the scaling"
+ ^ guess timesTwoPower: exp!
Item was changed:
----- Method: Integer>>slidingLeftRightRaisedTo:modulo: (in category 'private') -----
slidingLeftRightRaisedTo: n modulo: m
"Private - compute (self raisedTo: n) \\ m,
Note: this method has to be fast because it is generally used with large integers in cryptography.
It thus operate on exponent bits from left to right by packets with a sliding window rather than bit by bit (see below)."
| pow j k w index oddPowersOfSelf square |
"Precompute powers of self for odd bit patterns xxxx1 up to length w + 1.
The width w is chosen with respect to the total bit length of n,
such that each bit pattern will on average be encoutered P times in the whole bit sequence of n.
This costs (2 raisedTo: w) multiplications, but more will be saved later (see below)."
k := n highBit.
w := (k highBit - 1 >> 1 min: 16) max: 1.
oddPowersOfSelf := Array new: 1 << w.
oddPowersOfSelf at: 1 put: (pow := self).
square := self * self \\ m.
2 to: oddPowersOfSelf size do: [:i | pow := oddPowersOfSelf at: i put: pow * square \\ m].
"Now exponentiate by searching precomputed bit patterns with a sliding window"
pow := 1.
[k > 0]
whileTrue:
[pow := pow * pow \\ m.
"Skip bits set to zero (the sliding window)"
(n bitAt: k) = 0
ifFalse:
["Find longest odd bit pattern up to window length (w + 1)"
j := k - w max: 1.
[j < k and: [(n bitAt: j) = 0]] whileTrue: [j := j + 1].
"We found an odd bit pattern of length k-j+1;
perform the square powers for each bit
(same cost as bitwise algorithm);
compute the index of this bit pattern in the precomputed powers."
index := 0.
[k > j] whileTrue:
[pow := pow * pow \\ m.
+ index := index * 2 + (n bitAt: k).
- index := index << 1 + (n bitAt: k).
k := k - 1].
"Perform a single multiplication for the whole bit pattern.
This saves up to (k-j) multiplications versus a naive algorithm operating bit by bit"
pow := pow * (oddPowersOfSelf at: index + 1) \\ m].
k := k - 1].
^pow!
Item was removed:
- ----- Method: Integer>>sqrt (in category 'mathematical functions') -----
- sqrt
- "Answer the square root of the receiver."
-
- | selfAsFloat floatResult guess |
- selfAsFloat := self asFloat.
- floatResult := selfAsFloat sqrt.
-
- floatResult isInfinite ifFalse: [
- guess := floatResult truncated.
-
- "If got an exact answer, answer it. Otherwise answer float approximate answer."
- guess squared = self
- ifTrue: [ ^ guess ]].
-
- "In this case, maybe it failed because we are such a big integer that the Float method becomes
- inexact, even if we are a whole square number. So, try the slower but more general method"
- selfAsFloat >= Float maxExactInteger asFloat squared
- ifTrue: [
- guess := self sqrtFloor.
- guess squared = self ifTrue: [
- ^guess ].
-
- "Nothing else can be done. No exact answer means answer must be a Float.
- Answer the best we have which is the rounded sqrt."
- guess := (self * 4) sqrtFloor.
- ^(guess // 2 + (guess \\ 2)) asFloat].
-
- "We need an approximate result"
- ^floatResult!
Item was changed:
----- Method: Integer>>sqrtFloor (in category 'mathematical functions') -----
sqrtFloor
"Return the integer part of the square root of self
Assume self >= 0
+ The following post-conditions apply:
- The following invariants apply:
1) self sqrtFloor squared <= self
2) (self sqrtFloor + 1) squared > self"
+ self subclassResponsibility!
- ^self sqrtFloorNewtonRaphson!
Item was removed:
- ----- Method: Integer>>sqrtFloorNewtonRaphson (in category 'private') -----
- sqrtFloorNewtonRaphson
- "Return the integer part of the square root of self.
- Use a Newton-Raphson iteration since it converges quadratically
- (twice more bits of precision at each iteration)"
-
- | guess delta |
- guess := 1 bitShift: self highBit + 1 // 2.
- [
- delta := guess squared - self // (guess bitShift: 1).
- delta = 0 ] whileFalse: [
- guess := guess - delta ].
- ^guess - 1!
Item was changed:
----- Method: Integer>>sqrtRem (in category 'mathematical functions') -----
sqrtRem
"Return an array with floor sqrt and sqrt remainder.
Assume self >= 0.
The following invariants apply:
1) self sqrtRem first squared <= self
2) (self sqrtRem first + 1) squared > self
3) self sqrtRem first squared + self sqrtRem last = self"
+ self subclassResponsibility!
- | s |
- s := self sqrtFloorNewtonRaphson.
- ^{ s. self - s squared } !
Item was changed:
----- Method: LargePositiveInteger>>sqrt (in category 'mathematical functions') -----
sqrt
+ "Answer the square root of the receiver.
+ If the square root is exact, answer an Integer, else answer a Float approximation.
+ Handle some tricky case of Float overflow or inaccuracy"
+
+ | selfAsFloat floatResult integerResult |
+ selfAsFloat := self asFloat.
+ floatResult := self asFloat sqrt.
+ floatResult isFinite
+ ifTrue:
+ [self mightBeASquare ifFalse: ["we know for sure no exact solution exists,
+ just answer the cheap float approximation without wasting time."
+ ^floatResult].
+ "Answer integerResult in case of perfect square"
+ integerResult := floatResult truncated.
+ integerResult squared = self ifTrue: [^integerResult]].
+
+ "In this case, maybe it failed because we are such a big integer that the Float method becomes
+ inexact, even if we are a whole square number. So, try the slower but more general method"
+ selfAsFloat >= Float maxExactInteger asFloat squared
+ ifTrue: [
+ integerResult := self sqrtFloor.
+ integerResult squared = self ifTrue: [
+ ^integerResult ].
+
+ "Nothing else can be done. No exact answer means answer must be a Float.
+ Answer the best we have which is the rounded sqrt."
+ integerResult := (self bitShift: 2) sqrtFloor.
+ ^((integerResult bitShift: -1) + (self bitAnd: 1)) asFloat].
- "If we know for sure no exact solution exists, then just answer the cheap float approximation without wasting time."
- | selfAsFloat |
- self mightBeASquare
- ifFalse:
- [selfAsFloat := self asFloat.
- selfAsFloat isFinite ifTrue: [^self asFloat sqrt ]].
+ "We need an approximate result"
+ ^floatResult!
- "If some exact solution might exist, or self asFloat isInfinite, call potentially expensive super"
- ^super sqrt!
Item was changed:
----- Method: LargePositiveInteger>>sqrtFloor (in category 'mathematical functions') -----
sqrtFloor
+ "See super. Use a fast divide and conquer strategy."
+
- "Like super, but use a faster divide and conquer strategy if it's worth"
-
- self digitLength >= 64 ifFalse: [^self sqrtFloorNewtonRaphson].
^self sqrtRem first!
Item was changed:
----- Method: LargePositiveInteger>>sqrtRem (in category 'mathematical functions') -----
sqrtRem
+ "See super. Use a divide and conquer method to perform this operation.
- "Like super, but use a divide and conquer method to perform this operation.
See Paul Zimmermann. Karatsuba Square Root. [Research Report] RR-3805, INRIA. 1999, pp.8. <inria-00072854>
https://hal.inria.fr/inria-00072854/PDF/RR-3805.pdf"
| n qr q s r sr a3a2 a1 a0 |
"Split self in 4 digits a3,a2,a1,a0 in base b,
such that most significant digit a3 >= b/4
It is not a problem to have a3 >= b,
so we can round b down to a whole number of bytes n"
n := self highBit bitShift: -5. "bitShift: -2 divide in 4 parts, bitShift: -3 round down in bytes"
- n >= 16 ifFalse: [^super sqrtRem].
a3a2 := self butLowestNDigits: n * 2.
a1 := self copyDigitsFrom: n + 1 to: n * 2.
a0 := self lowestNDigits: n.
sr := a3a2 sqrtRem.
qr := (sr last bitShift: 8 * n) + a1 divideByInteger: (sr first bitShift: 1).
q := qr first normalize.
s := (sr first bitShift: 8 * n) + q.
r := (qr last normalize bitShift: 8 * n) + a0 - q squared.
r negative
ifTrue:
[r := (s bitShift: 1) + r - 1.
s := s - 1].
sr at: 1 put: s; at: 2 put: r.
^sr
!
Item was changed:
----- Method: SmallFloat64>>sqrt (in category 'mathematical functions') -----
sqrt
"Answer the square root of the receiver.
Optional. See Object documentation whatIsAPrimitive."
- | exp guess eps delta |
<primitive: 555>
+ ^super sqrt!
- #Numeric.
- "Changed 200/01/19 For ANSI <number> support."
- "Newton-Raphson"
- self <= 0.0
- ifTrue: [self = 0.0
- ifTrue: [^ 0.0]
- ifFalse: ["v Chg"
- ^ DomainError signal: 'sqrt undefined for number less than zero.']].
- "first guess is half the exponent"
- exp := self exponent // 2.
- guess := self timesTwoPower: 0 - exp.
- "get eps value"
- eps := guess * Epsilon.
- eps := eps * eps.
- delta := self - (guess * guess) / (guess * 2.0).
- [delta * delta > eps]
- whileTrue:
- [guess := guess + delta.
- delta := self - (guess * guess) / (guess * 2.0)].
- ^ guess!
Item was added:
+ ----- Method: SmallInteger>>isAnExactFloat (in category 'testing') -----
+ isAnExactFloat
+ "See super.
+ When you're small, the fastest way is to try"
+
+ ^self asFloat truncated = self!
Item was changed:
----- Method: SmallInteger>>sqrt (in category 'mathematical functions') -----
sqrt
+ "Answer the square root of the receiver.
+ If the square root is exact, answer an Integer, else answer a Float approximation"
+ | floatResult integerResult |
self negative ifTrue: [
^ DomainError signal: 'sqrt undefined for number less than zero.' ].
+ floatResult := self asFloat sqrt.
+ integerResult := floatResult truncated.
+ "Note: truncated works for 60-bit SmallInteger
+ If self is a square s^2, but asFloat rounds down,
+ f = s^2*(1-e), f^0.5 = s*(1-e)^0.5 = s*(1-0.5*e+O(e^2))
+ since s asFloat is exact, and e <= 0.5*ulp(1),
+ s*(1-0.5*e+O(e^2)) always rounds to s"
+ integerResult * integerResult = self ifTrue: [^integerResult].
+ ^floatResult!
- ^super sqrt!
Item was added:
+ ----- Method: SmallInteger>>sqrtFloor (in category 'mathematical functions') -----
+ sqrtFloor
+ "See super. Use asFloat sqrt which is known to be exactly rounded.
+ Adjust the result in case self asFloat is inexact.
+ An example why it is necessary with 60 bits SmallInteger is:
+ | i |
+ i := (1<<28-1) squared - 1.
+ i asFloat sqrt truncated squared <= i.
+ What happens is that i and and next perfect square above i, s^2
+ are rounded to the same Float f >= s^2.
+ In other words, asFloat did cross the next perfect square boundary.
+ The guess is at most off by 1, because the next next perfect square is:
+ (s + 1) squared = (2*s + s squared + 1)
+ s squared has at most 60 bits, and 2*s has 31 bits in this case,
+ s squared highBit - (2*s) highBit < Float precision,
+ so we are sure that next next perfect square is a different Float."
+
+ | guess |
+ guess := self asFloat sqrt truncated.
+ guess * guess > self ifTrue: [^guess - 1].
+ ^guess!
Item was added:
+ ----- Method: SmallInteger>>sqrtRem (in category 'mathematical functions') -----
+ sqrtRem
+ "See super"
+
+ | s |
+ s := self sqrtFloor.
+ ^{s. self - (s*s)}!