Strings

Richard A. O'Keefe ok at cs.otago.ac.nz
Mon Apr 22 03:41:10 UTC 2002


jennyw <jennyw at dangerousideas.com> wrote:
	This code:
	
	a := 'squeak'.
	'squeak' at: 6 put: $l.
	
	causes the value of a to change to 'squeal'.

Just like it could in Lisp and C...

	I understand how this is happening (there is only one instance
	of a string literal in squeak, as with symbols;

Not true.  There may be one copy _per method_, but a string literal in
one method will _not_ be #== to a string literal with the same contents
in a different method.  Try it!

Conversely, it has nothing to do with strings as such.  Try
    #(1) == #(1)
in a Workspace.  I get 'true'.

	It just seems kind of odd to me.

There are two parts to the problem:
(1) Literals are not copied when referenced, so a change to an object
    that was (ultimately) obtained from a literal changes the original
    literal.
(2) Some types of literals may be merged.
    (So that literal in a method's literal pool could be accessed with
    short indices, or to save space, or whatever.)

It may be odd, but many programming languages do it this way.
If I say
	a = "squeak";
in C, that doesn't allocate a new copy of the string and make 'a'
point to it.  And, C compilers are allowed to merge string literals.

Java _does_ merge string literals (but not as far as you might hope)
and gets away with it because it doesn't let you alter strings.

Lisp systems may merge literals (but need not); details are in CLtL2.
So the same kind of thing can happen in Lisp.

Eiffel has had to face this issue also.

The thing is, if you want copying, you can easily force it by writing
    a := 'squeak' copy
or
    b := #(3 1 4 1 5 9) copy
but if you have copying as your normal rule, it's hard to get sharing
if you really want it.

Copying only pays off when things are mutated, but very very few string
literals in normal Squeak code ever are mutated like that.
	



More information about the Squeak-dev mailing list