Litteral arrays parsing

Wolfgang Helbig helbig at Lehre.BA-Stuttgart.DE
Wed May 24 06:10:02 UTC 2006


You want me to try something out:
>Hi Wolfgang,
>Just try:
>    (Compiler evaluate: '#') inspect.
>and you will see this ascii 30 dangerously leaking from internal...

And I did. And I see. And this is wrong. A symbol constant is the sharp 
character followed by an identifier or binary selector, both of which are
nonempty strings. Looks like the compiler in this case accepts an empty string,
with the record separator, that is ASCII 30, internally marking the end of the 
string (Not the end of the file). Anyway, the compiler must not accept '#'.

>If # alone were really a valid syntax, then:
>    (Compiler evaluate: '# inspect').
>should inspect it...

But it isn't, so it should not be accepted. In this case, the compiler is
right in not accepting it.

>It does not, because space is just ignored:
>    (Compiler evaluate: '# inspect') inspect.

>So as extra sharp signs:
>    (Compiler evaluate: '# # # # inspect') inspect.
>Do you agree with such behavior ?

There seems to be an error with the compiler handling empty strings.
They are represented internally by the record separator. Inspecting nonempty
strings reveal, that the record separator does not mark the end of string,
as I thought originally.

Squeak inherited this error from Smalltalk-80 V2, which I tested using the
Hobbes emulator.

Aside: Since zero is not a natural number, you have to handle empty strings
as special cases, makeing the program unnatural and inviting errors.
Couldn't resist :-)
End of Aside

>> >Behind #, i would expect a letter [a-z][A-Z], a string quote ', or an
>> > opening parenthesis (. Maybe a second # in Dolphin Smalltalk extension...
>> >
>> >What else does make sense according to Smalltalk formal definition?
>> According to the syntax diagrams in the Book (choose the book's color from
>> blue, yellow or purple), the sharp character may occur as the first
>> character of an array constant or a symbol constant. In these positions it
>> is followed by a left parenthesis, if it marks an array constant, otherwise
>> it marks a symbol constant and is followed by a letter, a special character
>> or a minus character. Remember, special characters are the ones that make a
>> binary selector.
>Oh yes, i should not have forgotten... #* #-
>In latest squeak, also work with any number of special characters like #***.
>In VW you can have a ByteArray with #[ 0 0 ]
>> Inside a string or a comment, the sharp character may be followed by any of
>> the 95 graphic characters.
>> And finally, inside a character constant, the sharp character may be
>> followed by any character.
>I do not understand this sentence. Isn't it the dollar that is used in 
>character constants ?

Yes, it is. And $# can be followed by any character.

>Or is it inside a literal array like #( ^x:=y at z ), in which case each 
>character is interpreted as a single character symbol...
>For fun, note that Squeak does not complain when you write
># $a

This is not a valid expression.

>> This holds for the language as defined formally by the syntax diagrams, but
>> not for the Smalltalk programming language as described informally by the
>> Blue Book, where "any character" may occur inside comments, strings and
>> character constants, that is not only the graphic characters but ASCII
>> control characters as well, like carriage return, horizontal tabulator or
>> record separator which is ASCII 30.
>> And this again differs from the language as accepted by the compiler in the
>> V2 image of Smalltalk-80. For example, the ASCII 0 character inside a
>> character constant gets you an index error. But this is another thread :-)
>You mean using ascii value as an index in the scanner character table?
>I started with st-80 v2.3 but just don't remember such details...

Months ago, I stumbled across this error in ST-80 V2 when I tried to read back
a form from the output of 
	aForm storeOn: aStream.
The stream then contained ASCII control characters, like ASCII 0, which on 
reading back triggered. Here is my report I've sent Dan on April 5th:


| f s n|
f _ StandardFileStream oldFileNamed: '' .
s _ String new: f size.
n _ 0.
f do: [ :v | n _ n + 1. s at: n put: v].
TextConstants at: #ST80DefaultTextStyle put: (Compiler evaluate: s)

The above gives me a "subscript is out of bounds: 0".
Debug shows:


	tokenType _ #binary.
	token _ Symbol internCharacter: self step.
	((typeTable at: hereChar asciiValue) = #xBinary and: [hereChar ~= $-])
		ifTrue: [token _ (token , (String with: self step)) asSymbol]

And, of course, "hereChar asciiValue" is zero.

(End of Report)

By the way, I still don't have those venerable ST-80 fonts in a Squeak image.


Weniger, aber besser.

More information about the Squeak-dev mailing list