Parsing Numbers

Andreas Raab andreas.raab at gmx.de
Sat Sep 17 22:32:57 UTC 2005


Tom Phoenix wrote:
> Rule 1: In writing numerals in a radix past base ten, capital letters
> must be used to represent the extra digits. Under this rule, 16r1E4 is
> the hexadecimal number 1E4 (484), but 16r1e4 is 65536.

The fact that "16r1E4 ~= 16r1e4" is deeply disturbing if you ever have 
to copy hex constants from somewhere else. This is just what I did and I 
was staring for an hour at that code wondering why the heck it would 
compute total nonsense. The fact that this was crypto code and that 
crypto code often involves lots of magic hex constants makes it even 
more disturbing (just think about what havoc a wrongly spelled hex 
constant might wreck on you).

> Rule 2: In writing numerals in a radix past base ten, bare exponents
> are disallowed. Under this rule, 16r1E4 and 16r1e4 are equal to 484.
> (Under this rule, 16r1e+4 may still be used to denote 65536, if we
> wish to allow such a thing.)

That would be somewhat better but note that the problem here is that 
"16r1e+4" can be easily interpreted as "16r1e + 4".

> Is there any alternative rule that's any better than either of these?

How about: In order for consistency to prevail, an (upper or lower case) 
character that _could_ be interpreted as a digit under the current base 
_will_ be interpreted as such. This would mean 16r1e4 = 16r1E4 and if 
you do need an exponent you have to compute it, say "16r1e4 * 10e4". 
Which I will admit doesn't look exactly great either but given that I've 
yet to find code which has used bases greater than ten with exponents I 
feel pretty safe that this won't be a major issue.

I guess the fundamental question here is: Is it more important to have 
an easy way to write upper or lower case hex constants or to be able to 
use lower case "e" (and possibly other characters) to denote 
exponentiation for bases greater than ten?

My (obvious) preference is the first - I have never had the need to 
write exponents for bases greater than ten I don't expect to see that 
need in the future. Contrary to which I use (and copy!) hex constants 
all the time and I am used to read them upper and lower case mixed.

> As a side issue, we should decide whether 'd' and 'q' can stand in for
> 'e'. I'm sure that allowing that would help somebody, and it's not
> likely to cause many problems, once we decide which of the above rules
> to use.

Like I was saying before, where exactly is this used? Who would be 
helped by it and why would we care given that Squeak doesn't have the 
distinction between single, double, and quad precision?

Cheers,
   - Andreas



More information about the Squeak-dev mailing list