Execute the following piece of code:
"-------------------------------------------------------------------" |stream stream2| stream := WriteStream with: 'a test '. stream reset. stream nextPutAll: 'to test'. self assert: [stream contents = 'to test'].
"On the following line, you can remove 'copy'" "the problem is present without. It's here to prevent" " the streams from using the same collection" "because the compiler tries to avoid creating 2 identical" "strings" stream2 := WriteStream with: 'a test ' copy. stream2 nextPutAll: 'to test'.
"This assert passes but this is abnormal" self assert: [stream2 contents = 'to testto test'].
"This assert pass and this is abnormal too" "because the strings MUST be equal !!" self assert: [stream2 contents ~= 'a test to test'] "-------------------------------------------------------------------"
On my image, all the 3 tests pass. This is completely abnormal in my opinion. In the second test, where does 'to testto test' come from ???
VM: Squeak VM version: 3.9-8 #2 Tue Oct 10 21:41:34 PDT 2006 gcc 4.0.1 Built from: Squeak3.9alpha of 4 July 2005 [latest update: #7021] Build host: Darwin margaux 8.8.0 Darwin Kernel Version 8.8.0: Fri Sep 8 17:18:57 PDT 2006; root:xnu-792.12.6.obj~1/RELEASE_PPC Power Macintosh powerpc default plugin location: /usr/local/lib/squeak/3.9-8/*.so
Image: squeak-dev-76
I will try with other images and the new compiler and let you know.
Can you test with your system please and let us know?
Bye
I will try with other images and the new compiler and let you know.
Same with:
Squeak VM version: 3.7-7 #1 Sat Mar 19 13:23:20 PST 2005 gcc 3.3 Built from: Squeak3.7 of '4 September 2004' [latest update: #5989] Build host: Darwin emilia.local 7.8.0 Darwin Kernel Version 7.8.0: Wed Dec 22 14:26:17 PST 2004; root:xnu/xnu-517.11.1.obj~1/ RELEASE_PPC Power Macintosh powerpc default plugin location: /usr/local/lib/squeak/3.7-7/*.so
On Feb 14, 2007, at 16:18 , Damien Cassou wrote:
Execute the following piece of code:
"-------------------------------------------------------------------" |stream stream2| stream := WriteStream with: 'a test '. stream reset. stream nextPutAll: 'to test'. self assert: [stream contents = 'to test'].
"On the following line, you can remove 'copy'" "the problem is present without. It's here to prevent" " the streams from using the same collection" "because the compiler tries to avoid creating 2 identical" "strings" stream2 := WriteStream with: 'a test ' copy. stream2 nextPutAll: 'to test'.
"This assert passes but this is abnormal" self assert: [stream2 contents = 'to testto test'].
"This assert pass and this is abnormal too" "because the strings MUST be equal !!" self assert: [stream2 contents ~= 'a test to test'] "-------------------------------------------------------------------"
On my image, all the 3 tests pass. This is completely abnormal in my opinion.
It is normal.
You are modifying the 'a test ' literal into 'to test'. This modified string gets copied in the second test.
Lesson: never modify string literals.
- Bert -
All,
This is a really good example of a string literal problem. I'm a fanatic about using copy after any hard coded string in code. So much so that I've had a number of developers make fun of my code because of it. Whenever I'm teaching someone Smalltalk I always include the "just use copy on all string literals" suggestion. But I normally show the problem with a character replacement in a string instead. This is a good example of how someone might make a big mistake and then spend a lot of time trying to figure out why everything is so messed up.
I was indoctrinated into the just-use-copy club by Versant. If you had a string literal in a method Versant stored your code in the DB. Then if you tried to change your code without being connected to the database everything blew up!
So this can be fixed by:
|stream stream2| stream := WriteStream with: 'a test ' copy. stream reset. stream nextPutAll: 'to test' copy. self assert: [stream contents = 'to test' copy].
Ok the last copy is not really needed but "JUST USE COPY" works for me!
Ron Teitelbaum
From: Bert Freudenberg Sent: Wednesday, February 14, 2007 10:36 AM
On Feb 14, 2007, at 16:18 , Damien Cassou wrote:
Execute the following piece of code:
"-------------------------------------------------------------------" |stream stream2| stream := WriteStream with: 'a test '. stream reset. stream nextPutAll: 'to test'. self assert: [stream contents = 'to test'].
"On the following line, you can remove 'copy'" "the problem is present without. It's here to prevent" " the streams from using the same collection" "because the compiler tries to avoid creating 2 identical" "strings" stream2 := WriteStream with: 'a test ' copy. stream2 nextPutAll: 'to test'.
"This assert passes but this is abnormal" self assert: [stream2 contents = 'to testto test'].
"This assert pass and this is abnormal too" "because the strings MUST be equal !!" self assert: [stream2 contents ~= 'a test to test'] "-------------------------------------------------------------------"
On my image, all the 3 tests pass. This is completely abnormal in my opinion.
It is normal.
You are modifying the 'a test ' literal into 'to test'. This modified string gets copied in the second test.
Lesson: never modify string literals.
- Bert -
Bert
It is normal.
No this is not. You get used to it and accept it.
You are modifying the 'a test ' literal into 'to test'. This modified string gets copied in the second test.
Lesson: never modify string literals.
It shows that the fact that the compiler optimizes the use of certain literals such as boolean and number is good for immutable objects but is wrong for mutable object such as strings.
Iin the semantics of Smalltalk nothing says that two strings with the same representation in the same methods are pointing to the same object. I did not check in which books but the difference between strings and symbols is really that two strings are pointing to two different objects, while symbols are referring to the same objects (and are immutable).
Stef
Lesson: never modify string literals.
Ohh god, how much I wish to have immutability on a per object bases.
Cheers, Lukas
Do you have example? Because VW introduced immutable objects and I would like to educate my taste on this topic.
On 14 févr. 07, at 17:03, Lukas Renggli wrote:
Lesson: never modify string literals.
Ohh god, how much I wish to have immutability on a per object bases.
Cheers, Lukas
-- Lukas Renggli http://www.lukas-renggli.ch
I have the impression that the point of the initial question was lost in this thread, spinning into mutability, etc.
Let me try to phrase it in another way:
'foo' = 'foo' true "ok" 'foo' == 'foo' true "NOT OK"
The underlying reason is that whenever I execute one of these expressions in a workspace, a method is created behind my back and then executed. In the case of comparing two strings, the compiler creates *a single literal for both strings*. This is plain wrong: a string is a collection of characters, and therefore these are two different instances of this collection that happen to have the same contents.
Now some more experiments:
Take any class (say a new class A). First add the following two methods:
foo ^'foo'
foo2 ^'foo'
Now execute the following code: A new foo == A new foo "true ?!?!?!?!?!?!" A new foo == A new foo2 "false"
To top it all of, add one more method:
isIdentical: arg1 to: arg2
^arg1 == arg2
So consider the following now: A new isIdentical: A new foo with: A new foo2 "false" A new isIdentical: 'foo' to: 'foo' "true"
I can only conclude that this is really not what you want.....
On 14 Feb 2007, at 14 February/18:33, stephane ducasse wrote:
Do you have example? Because VW introduced immutable objects and I would like to educate my taste on this topic.
On 14 févr. 07, at 17:03, Lukas Renggli wrote:
Lesson: never modify string literals.
Ohh god, how much I wish to have immutability on a per object bases.
Cheers, Lukas
-- Lukas Renggli http://www.lukas-renggli.ch
On Feb 14, 2007, at 20:27 , Roel Wuyts wrote:
'foo' = 'foo' true "ok" 'foo' == 'foo' true "NOT OK" [...] I can only conclude that this is really not what you want.....
Why? If you want to test for identity, use a Symbol.
IMHO this is splitting hairs over a non-issue. The issue is mutability of literals.
- Bert -
On Wed, 14 Feb 2007 20:58:21 +0100, Bert Freudenberg wrote:
On Feb 14, 2007, at 20:27 , Roel Wuyts wrote:
'foo' = 'foo' true "ok" 'foo' == 'foo' true "NOT OK" [...] I can only conclude that this is really not what you want.....
Why? If you want to test for identity, use a Symbol.
IMHO this is splitting hairs over a non-issue. The issue is mutability of literals.
... which are not constants but objects created from literally descriptions, therefore their name :)
IMHO Lukas had the best suggestion so far, something like a preference which demands to compile literals as if they where constants.
/Klaus
- Bert -
On Feb 14, 2007, at 9:08 PM, Klaus D. Witzel wrote:
On Wed, 14 Feb 2007 20:58:21 +0100, Bert Freudenberg wrote:
On Feb 14, 2007, at 20:27 , Roel Wuyts wrote:
'foo' = 'foo' true "ok" 'foo' == 'foo' true "NOT OK" [...] I can only conclude that this is really not what you want.....
Why? If you want to test for identity, use a Symbol.
IMHO this is splitting hairs over a non-issue. The issue is mutability of literals.
... which are not constants but objects created from literally descriptions, therefore their name :)
IMHO Lukas had the best suggestion so far, something like a preference which demands to compile literals as if they where constants.
Yes but this will be messy since you don't know when you have to turn it on or off. Isn't it?
Math
/Klaus
- Bert -
Mathieu Suen a écrit :
On Feb 14, 2007, at 9:08 PM, Klaus D. Witzel wrote:
On Wed, 14 Feb 2007 20:58:21 +0100, Bert Freudenberg wrote:
On Feb 14, 2007, at 20:27 , Roel Wuyts wrote:
'foo' = 'foo' true "ok" 'foo' == 'foo' true "NOT OK" [...] I can only conclude that this is really not what you want.....
Why? If you want to test for identity, use a Symbol.
IMHO this is splitting hairs over a non-issue. The issue is mutability of literals.
... which are not constants but objects created from literally descriptions, therefore their name :)
IMHO Lukas had the best suggestion so far, something like a preference which demands to compile literals as if they where constants.
Yes but this will be messy since you don't know when you have to turn it on or off. Isn't it?
Math
I would hate that same code lead to different results depending on a global preference set somewhere in the image...
Unless you use:
a) a compiler directive in a pragma (i know, Lukas don't like this use of annotations)
<thisCompiler literalAreMutable: false> WriteStream on: 'test'.
b) a message explicitely stating the literal should be mutable.
WriteStream on: 'test' beMutable.
Nicolas
/Klaus
- Bert -
No, you did not get the point.
Would you say that:
(Collection new add: $f; add: $o; add: $o; yourself) == (Collection new add: $f; add: $o; add: $o; yourself) ?
Besides, the last example in my mail is also worth explaining...
On 14 Feb 2007, at 14 February/20:58, Bert Freudenberg wrote:
On Feb 14, 2007, at 20:27 , Roel Wuyts wrote:
'foo' = 'foo' true "ok" 'foo' == 'foo' true "NOT OK" [...] I can only conclude that this is really not what you want.....
Why? If you want to test for identity, use a Symbol.
:-)
IMHO this is splitting hairs over a non-issue. The issue is mutability of literals.
If Squeak is the only Smalltalk that has this behaviour for Strings, than it shows that is definitely an issue........... I ported T-Gen and the ParserCompiler, and suddenly this non-trivial issue becomes vital. We are still unable to port the logic language Soul to Squeak because of this issue, because, sorry, symbols use a flyweight pattern and are unique while Strings are collections of characters and should behave as such. It is a simple issue in itself. Besides, if it would be only splitting hairs, then why are all beginner's books full of warning for this issue ? Ever tried to teach Smaltalk to a class of newbies ? Ever had students come up to you because when they find some examples in a book or on the web and they tried in Squeak the results are different ? Think about Smalltalk being this nice and clean language where everything is logical and then having to remember by heart some stupid rules because I am splitting hairs ???????
Besides, have a look at the last part of my mail. Would you not consider this wrong ? Depending on whether you call the behaviour from a method or not you get different behaviour ???????????????
[PS: Yes, you hit a sore spot there]
-- Roel
Roel Wuyts Roel.Wuyts@ulb.ac.be writes:
'foo' = 'foo' true "ok" 'foo' == 'foo' true "NOT OK"
The underlying reason is that whenever I execute one of these expressions in a workspace, a method is created behind my back and then executed. In the case of comparing two strings, the compiler creates *a single literal for both strings*. This is plain wrong: a string is a collection of characters, and therefore these are two different instances of this collection that happen to have the same contents.
Indeed it is good to get this straight. The semantics you propose makes me thing of cons in Scheme, where cons is guaranteed to give you a fresh object every time it is executed. Additionally, the result is guaranteed to be mutable, so it's especially important that separate calls to cons return separate objects!
The other viewpoint also seems to also make sense, though: literals describe a *read-only* object, and thus the compiler may reuse them if it likes.
FWIW, the ANSI standard supports read-only literals, but does not require them to be. It says that you get undefined behavior if you try to modify an object created via a literal. See section 3.4.6.3, "String literals". We do not have to conform to ANSI, of course, but other Smalltalks might consider it.
Also, FWIW, if you like the ANSI version, it is easy to implement. Here is a version using a dead forked dialect of Squeak; it should be easy to dust it off should anyone want. This version goes further than discussed in this thread, and even makes floats and large integers be immutable. :) The code is in islands.zip on the following page; look inside the zip for immutLits1.5.cs and immutLits2.2.cs.
http://wiki.squeak.org/squeak/2074
If anyone is passionate about this issue, by all means open up a Mantis entry! Judging from the discussion so far, however, it may be hard to come to a decision....
-Lex
Also, FWIW, if you like the ANSI version, it is easy to implement. Here is a version using a dead forked dialect of Squeak; it should be easy to dust it off should anyone want. This version goes further than discussed in this thread, and even makes floats and large integers be immutable. :) The code is in islands.zip on the following page; look inside the zip for immutLits1.5.cs and immutLits2.2.cs.
In 3.8 and 3.9 there are 5 subclasses of String (Symbol, Byte and Wide) and many more for ArrayedCollection. Your change would introduce many read-only classes and lead to much duplicated code (ok, traits would be a big help here).
Immutability should be an instance-level property and not special class. Immutability is completely ortogonal to inheritance und therefor should not abuse the inheritance mechanism. I know that a change like this would require some deep changes to the object representation in the VM. I just hope that someday somebody dares to make the step from 3.x to 4.0 where something like that is maybe possible ...
Cheers, Lukas
Lukas Renggli wrote:
Also, FWIW, if you like the ANSI version, it is easy to implement. Here is a version using a dead forked dialect of Squeak; it should be easy to dust it off should anyone want. This version goes further than discussed in this thread, and even makes floats and large integers be immutable. :) The code is in islands.zip on the following page; look inside the zip for immutLits1.5.cs and immutLits2.2.cs.
In 3.8 and 3.9 there are 5 subclasses of String (Symbol, Byte and Wide) and many more for ArrayedCollection. Your change would introduce many read-only classes and lead to much duplicated code (ok, traits would be a big help here).
Just for the records, for strings there wouldn't be a need for more than a single new class. Here is why:
During the refactoring of the original m17n string hierarchy at one point I *very* seriously considered implementing Symbols differently, namely as a subclass of String with an iVar "string" that simply delegates the (few) actual requirements of the concrete subclasses to that variable. Pretty much the only reason not to do that was that the VM at places assumes that selectors are byte-indexable objects (like for printing debug stacks etc) and I figured that upsetting the internal String hierarchy was enough for one round without requiring additional VM changes (which turned out to be true, there was quite a bit of fallout in the aftermath of these changes and having VM dependencies would have made things unnecessarily harder).
However, it is *utterly* trivial to implement a subclass of String (call it "ImmutableString") that delegates the subclass responsibilities of String to an iVar and simply raises errors when trying to access them via #at:put: and friends. Then you simply implement String>>asImmutableString properly (^ImmutableString on: self) and off ya go!
Cheers, - Andreas
However, it is *utterly* trivial to implement a subclass of String (call it "ImmutableString") that delegates the subclass responsibilities of String to an iVar and simply raises errors when trying to access them via #at:put: and friends. Then you simply implement String>>asImmutableString properly (^ImmutableString on: self) and off ya go!
Certainly that will work for String's in 3.7. In 3.8 there is also ByteString and WideString. Futhermore you would probably also need immutabilty for other instances like LargeInteger's, Float's, Array's and maybe some other classes.
Cheers, Lukas
"Lukas Renggli" renggli@gmail.com writes:
Also, FWIW, if you like the ANSI version, it is easy to implement. Here is a version using a dead forked dialect of Squeak; it should be easy to dust it off should anyone want. This version goes further than discussed in this thread, and even makes floats and large integers be immutable. :) The code is in islands.zip on the following page; look inside the zip for immutLits1.5.cs and immutLits2.2.cs.
In 3.8 and 3.9 there are 5 subclasses of String (Symbol, Byte and Wide) and many more for ArrayedCollection. Your change would introduce many read-only classes and lead to much duplicated code (ok, traits would be a big help here).
Immutability should be an instance-level property and not special class.
Good point, there was just one String back then so it was simpler.
Still, there should be a reasonable way to engineer this at the image level. Smalltalk is a pretty cool language. :) And adding immutable strings and arrays, should be easier than adding a generic immutability system.
A generic immutability system would be cool, though. You could catch some bugs with it.
-Lex
On Mar 7, 2007, at 23:45 , Lex Spoon wrote:
"Lukas Renggli" renggli@gmail.com writes:
Also, FWIW, if you like the ANSI version, it is easy to implement. Here is a version using a dead forked dialect of Squeak; it should be easy to dust it off should anyone want. This version goes further than discussed in this thread, and even makes floats and large integers be immutable. :) The code is in islands.zip on the following page; look inside the zip for immutLits1.5.cs and immutLits2.2.cs.
In 3.8 and 3.9 there are 5 subclasses of String (Symbol, Byte and Wide) and many more for ArrayedCollection. Your change would introduce many read-only classes and lead to much duplicated code (ok, traits would be a big help here).
Immutability should be an instance-level property and not special class.
Good point, there was just one String back then so it was simpler.
What Andreas was saying (IIUC) is that the abstract String class provides default implementations for all String functions that use a very small number of "kernel" methods that are declared subclass- responsibility. That means all the rest of the code in the String subclasses is just an optimization, so adding a new subclass should be straight-forward.
This of course assumes that people actually honor that design and not bypass it by adding functionality to String subclasses directly.
- Bert -
On Feb 14, 2007, at 16:56 , stephane ducasse wrote:
Bert
It is normal.
No this is not. You get used to it and accept it.
You are modifying the 'a test ' literal into 'to test'. This modified string gets copied in the second test.
Lesson: never modify string literals.
It shows that the fact that the compiler optimizes the use of certain literals such as boolean and number is good for immutable objects but is wrong for mutable object such as strings.
Iin the semantics of Smalltalk nothing says that two strings with the same representation in the same methods are pointing to the same object. I did not check in which books but the difference between strings and symbols is really that two strings are pointing to two different objects, while symbols are referring to the same objects (and are immutable).
The sharing is not the primary problem, the mutability is.
- Bert -
you can do an optimization (having only one string in the literal frame) when this does not impact the semantics of the language. If strings would be immutable then there would be no problem to have only one because we would not see the difference.
It is normal.
No this is not. You get used to it and accept it.
You are modifying the 'a test ' literal into 'to test'. This modified string gets copied in the second test.
Lesson: never modify string literals.
It shows that the fact that the compiler optimizes the use of certain literals such as boolean and number is good for immutable objects but is wrong for mutable object such as strings.
Iin the semantics of Smalltalk nothing says that two strings with the same representation in the same methods are pointing to the same object. I did not check in which books but the difference between strings and symbols is really that two strings are pointing to two different objects, while symbols are referring to the same objects (and are immutable).
The sharing is not the primary problem, the mutability is.
- Bert -
Stef,
why should strings be immutable, and why blame the compiler.
The same situation is, for example, with Associations. In order to prevent mutation, someone invented ReadOnlyVariableBinding.
Literals have nothing much to do with compiler optimization, see senders of #encodeLiteral:, just with determining the correct bytecode for pushing them onto the stack.
But of course the compiler could emit code for always copying string literals, if you can afford the performance penalty.
As Bert wrote: it's normal :)
[Okay okay other languages have immutable string, but this is Smalltalk.]
/Klaus
On Wed, 14 Feb 2007 16:56:56 +0100, stephane ducasse wrote:
Bert
It is normal.
No this is not. You get used to it and accept it.
You are modifying the 'a test ' literal into 'to test'. This modified string gets copied in the second test.
Lesson: never modify string literals.
It shows that the fact that the compiler optimizes the use of certain literals such as boolean and number is good for immutable objects but is wrong for mutable object such as strings.
Iin the semantics of Smalltalk nothing says that two strings with the same representation in the same methods are pointing to the same object. I did not check in which books but the difference between strings and symbols is really that two strings are pointing to two different objects, while symbols are referring to the same objects (and are immutable).
Stef
As Bert wrote: it's normal :)
I agree, looks completely normal to me.
[Okay okay other languages have immutable string, but this is Smalltalk.]
If we had an immutability bit, that the compiler would set for objects in the literal array (and with what we could do a lot of other cool stuff), then people would not run into such problems.
Cheers, Lukas
Hi Lukas,
on Wed, 14 Feb 2007 17:48:40 +0100, you wrote:
As Bert wrote: it's normal :)
I agree, looks completely normal to me.
[Okay okay other languages have immutable string, but this is Smalltalk.]
If we had an immutability bit, that the compiler would set for objects in the literal array (and with what we could do a lot of other cool stuff), then people would not run into such problems.
Since I cannot (stupid me ;-) think of a counterexample which would break existing code, I second you (perhaps in the new compiler? without modifying the VM?).
/Klaus
Cheers, Lukas
On Feb 14, 2007, at 5:48 PM, Lukas Renggli wrote:
As Bert wrote: it's normal :)
I agree, looks completely normal to me.
Do you agree with the fact that the compiler merges two different strings into one single variable ?
Then, how do you explain that?
'test' == 'test' => true 'test' == #test asString => false
"The compiler optimizes... same variable...", I know the explanation, but I don't think it should be accepted.
On 14 févr. 07, at 17:48, Lukas Renggli wrote:
As Bert wrote: it's normal :)
I agree, looks completely normal to me.
Is it normal for you that the same strings typed in two different methods are different but if they are typed in the same method they are the same?
I never saw that in the books I read on smalltalk but they can be all wrong and squeak right.
Stef
[Okay okay other languages have immutable string, but this is Smalltalk.]
If we had an immutability bit, that the compiler would set for objects in the literal array (and with what we could do a lot of other cool stuff), then people would not run into such problems.
Exact. Or the compiler could just optimize immutable objects we have: #symbol, boolean, integers.....
Stef
Cheers, Lukas
-- Lukas Renggli http://www.lukas-renggli.ch
<Lukas Renggli> If we had an immutability bit, that the compiler would set for objects in the literal array (and with what we could do a lot of other cool stuff), then people would not run into such problems. </Lukas Renggli>
As a principle of language design, literals should be immutable. And immutability needs to be independently settable for each named instance variable, and independently settable for the indexable slots (as a group, not for each index.) The reason is because some named instance variables may need to be "caching variables" whose values are lazily computed only when needed--in fact, such variables may need to be weak references so that the garbage collector can set them to nil.
--Alan
On Feb 14, 2007, at 5:37 PM, Klaus D. Witzel wrote:
Stef,
why should strings be immutable, and why blame the compiler.
It's not about making string immutable but is wen you use it as a literal it should be immutable. This show use that literal that are not immutable make really bad tedious side effect. ( Same reflection for #(a b c ) )
Math
The same situation is, for example, with Associations. In order to prevent mutation, someone invented ReadOnlyVariableBinding.
Literals have nothing much to do with compiler optimization, see senders of #encodeLiteral:, just with determining the correct bytecode for pushing them onto the stack.
But of course the compiler could emit code for always copying string literals, if you can afford the performance penalty.
As Bert wrote: it's normal :)
[Okay okay other languages have immutable string, but this is Smalltalk.]
/Klaus
On Wed, 14 Feb 2007 16:56:56 +0100, stephane ducasse wrote:
Bert
It is normal.
No this is not. You get used to it and accept it.
You are modifying the 'a test ' literal into 'to test'. This modified string gets copied in the second test.
Lesson: never modify string literals.
It shows that the fact that the compiler optimizes the use of certain literals such as boolean and number is good for immutable objects but is wrong for mutable object such as strings.
Iin the semantics of Smalltalk nothing says that two strings with the same representation in the same methods are pointing to the same object. I did not check in which books but the difference between strings and symbols is really that two strings are pointing to two different objects, while symbols are referring to the same objects (and are immutable).
Stef
Stef,
why should strings be immutable, and why blame the compiler.
I'm not saying that. I'm saying that since strings are not immutable then the compiler should not optimize the strings in a compiled method.
The same situation is, for example, with Associations. In order to prevent mutation, someone invented ReadOnlyVariableBinding.
Literals have nothing much to do with compiler optimization, see senders of #encodeLiteral:, just with determining the correct bytecode for pushing them onto the stack.
But of course the compiler could emit code for always copying string literals, if you can afford the performance penalty.
As Bert wrote: it's normal :)
[Okay okay other languages have immutable string, but this is Smalltalk.]
This is not my point.
do you think that from a language point of view this is good to say
ok two strings are not identical if they are typed in different methods but if there are typed in the same methods they are identical?
This is why I got some of my slides not working when I switch from Visualworks to Squeak I do not have an old version of visualworks but it seems that they were consistent with the view they offered to strings to programmers. and consistent is important.
Stef
Hi Stef,
on Wed, 14 Feb 2007 18:40:31 +0100, you wrote:
Stef,
why should strings be immutable, and why blame the compiler.
I'm not saying that. I'm saying that since strings are not immutable then the compiler should not optimize the strings in a compiled method.
The same situation is, for example, with Associations. In order to prevent mutation, someone invented ReadOnlyVariableBinding.
Literals have nothing much to do with compiler optimization, see senders of #encodeLiteral:, just with determining the correct bytecode for pushing them onto the stack.
But of course the compiler could emit code for always copying string literals, if you can afford the performance penalty.
As Bert wrote: it's normal :)
[Okay okay other languages have immutable string, but this is Smalltalk.]
This is not my point.
do you think that from a language point of view this is good to say
ok two strings are not identical if they are typed in different methods but if there are typed in the same methods they are identical?
Ah :) Now you know why I asked :)
[BTW: if you are able to compare string from different methods, why should anybody suddenly lack this capability when it comes to strings in the same method? The more general (comparision method) subsumes the more specific (comparision method). What else would someone need?]
Any other examples besides strings?
/Klaus
This is why I got some of my slides not working when I switch from Visualworks to Squeak I do not have an old version of visualworks but it seems that they were consistent with the view they offered to strings to programmers. and consistent is important.
Stef
On Feb 14, 2007, at 4:36 PM, Bert Freudenberg wrote:
It is normal.
This is normal for you because you know how the compiler works. But do you think the compiler works normally? Is it normal that a compiler considers two equal strings as identical? I would agree with symbols because symbols are immutable. I think this is a first bug, a bug in the compiler.
In my opinion, there is another bug. When the collection of a stream becomes full, its is replaced by another bigger collection. So, first, the stream uses the collection you passed to the constructor, then, at a given time, this collection is replaced by a new one. I don't think it's a normal behavior. In my opinion, the collection must always be the one you gave at the beginning OR it must always be a copy. I prefer the second solution.
So, what should be done ? I can write tests for the compiler and tests for streams to show the behavior. This tests will fail because they show a non corrected bug.
Lesson: never modify string literals.
Lesson: Use a correct compiler :-)
Hi Damien,
on Wed, 14 Feb 2007 17:19:13 +0100, you wrote:
On Feb 14, 2007, at 4:36 PM, Bert Freudenberg wrote:
It is normal.
This is normal for you because you know how the compiler works. But do you think the compiler works normally? Is it normal that a compiler considers two equal strings as identical? I would agree with symbols because symbols are immutable. I think this is a first bug, a bug in the compiler.
In my opinion, there is another bug. When the collection of a stream becomes full, its is replaced by another bigger collection. So, first, the stream uses the collection you passed to the constructor, then, at a given time, this collection is replaced by a new one. I don't think it's a normal behavior.
Whatever the stream does with the collection, it is encapsulated. Imagine the stream always uses a highly optimized species for its internal job (or a file on your harddisk!). You should not depend any code on the internals of (in this case) stream.
I suggest you use (aStream contents asArray) and then #= for comparing aStream's contents to your expectations.
/Klaus
In my opinion, the collection must always be the one you gave at the beginning OR it must always be a copy. I prefer the second solution.
So, what should be done ? I can write tests for the compiler and tests for streams to show the behavior. This tests will fail because they show a non corrected bug.
Lesson: never modify string literals.
Lesson: Use a correct compiler :-)
On Feb 14, 2007, at 5:48 PM, Klaus D. Witzel wrote:
Hi Damien,
on Wed, 14 Feb 2007 17:19:13 +0100, you wrote:
On Feb 14, 2007, at 4:36 PM, Bert Freudenberg wrote:
It is normal.
This is normal for you because you know how the compiler works. But do you think the compiler works normally? Is it normal that a compiler considers two equal strings as identical? I would agree with symbols because symbols are immutable. I think this is a first bug, a bug in the compiler.
In my opinion, there is another bug. When the collection of a stream becomes full, its is replaced by another bigger collection. So, first, the stream uses the collection you passed to the constructor, then, at a given time, this collection is replaced by a new one. I don't think it's a normal behavior.
Whatever the stream does with the collection, it is encapsulated. Imagine the stream always uses a highly optimized species for its internal job (or a file on your harddisk!). You should not depend any code on the internals of (in this case) stream.
I really don't want to depend on the implementation. And in my opinion, this is not encapsulated because this is MY String, not a String created internally. What I see is that the String I give to the new Stream is modified. Then at a moment, the String does not reflect the stream anymore. This doesn't sound coherent to me. And if you all agree to the current behavior, then a documentation should be written: "Don't use the collection after having created a stream on it !"
Hi Damien,
on Wed, 14 Feb 2007 17:58:02 +0100, you wrote:
On Feb 14, 2007, at 5:48 PM, Klaus D. Witzel wrote:
Hi Damien, on Wed, 14 Feb 2007 17:19:13 +0100, you wrote:
On Feb 14, 2007, at 4:36 PM, Bert Freudenberg wrote:
It is normal.
This is normal for you because you know how the compiler works. But do you think the compiler works normally? Is it normal that a compiler considers two equal strings as identical? I would agree with symbols because symbols are immutable. I think this is a first bug, a bug in the compiler.
In my opinion, there is another bug. When the collection of a stream becomes full, its is replaced by another bigger collection. So, first, the stream uses the collection you passed to the constructor, then, at a given time, this collection is replaced by a new one. I don't think it's a normal behavior.
Whatever the stream does with the collection, it is encapsulated. Imagine the stream always uses a highly optimized species for its internal job (or a file on your harddisk!). You should not depend any code on the internals of (in this case) stream.
I really don't want to depend on the implementation. And in my opinion, this is not encapsulated because this is MY String, not a String created internally.
Not really. If you pass a boxed object (other than a SmallInteger) the recipient can #become: it to anything he/she likes. This is reality.
What I see is that the String I give to the new Stream is modified. Then at a moment, the String does not reflect the stream anymore. This doesn't sound coherent to me.
It is not coherent because you passed an explicitly written *constant* which, in other languages, is believed to be immutable.
And if you all agree to the current behavior, then a documentation should be written: "Don't use the collection after having created a stream on it !"
Easier: don't pass constant collections to the streamers :)
Another example, not to blame on any streamer: | tmp | tmp := 'lowercase'. tmp translateToUppercase == tmp
/Klaus
Hi Klaus,
Klaus D. Witzel wrote:
I really don't want to depend on the implementation. And in my opinion, this is not encapsulated because this is MY String, not a String created internally.
Not really. If you pass a boxed object (other than a SmallInteger) the recipient can #become: it to anything he/she likes. This is reality.
I'm not talking about security, only unconsistent side effect.
Klaus D. Witzel wrote:
What I see is that the String I give to the new Stream is modified. Then at a moment, the String does not reflect the stream anymore. This doesn't sound coherent to me.
It is not coherent because you passed an explicitly written *constant* which, in other languages, is believed to be immutable.
And if you all agree to the current behavior, then a documentation should be written: "Don't use the collection after having created a stream on it !"
Easier: don't pass constant collections to the streamers :)
It's not about constant collections at all here.
myCollection := String new: 3. myStream := WriteStream on: myCollection. myStream nextPutAll: 'abcd' copy.
Here, myCollection is left untouched which sounds normal (it's still an empty string of size 3). Now, lets replace #nextPutAll: by 4 #nextPut:
myCollection := String new: 3. myStream := WriteStream on: myCollection. myStream nextPut: $a; nextPut: $b; nextPut: $c; nextPut: $d.
This should have exactly the same behavior... however myCollection now equals 'abc' !!! Why the first 3 characters ? Why not everything or nothing at all ? This is why I think it's not coherent.
I've read the source code and I understand why it happens but I don't think it's coherent. And this as nothing to do with literals nor with immutability. This is a completely different problem (this is why I changed the thread title).
Hi Damien,
I understand your argument and the fresh light that you've thrown on it.
BTW "having a side effect" is not what happens, after all you *want* the streamer to write, isn't it so. And in your "myStream nextPutAll: 'abcd' copy" the #copy is superflous (has no effect).
But anyways, lets not argue about the thread title.
For sure I do like consistency and friends like coherence.
My concern is that #become: would be a gun instead of using a pigeon transporting a peace message (so to speak).
So, what's your solution? Perhaps, like we have heard from the VW folks, should the streamer be adapted to *also* work on an OrderedCollection (which automagically grows). So that people can expect that, if they pass anOrderedCollection, all is fine with its contents and identity (i.e. because of the behavior which is already in OrderedCollection).
/Klaus
On Thu, 15 Feb 2007 09:14:50 +0100, Damien Cassou wrote:
Hi Klaus,
Klaus D. Witzel wrote:
I really don't want to depend on the implementation. And in my opinion, this is not encapsulated because this is MY String, not a String created internally.
Not really. If you pass a boxed object (other than a SmallInteger) the recipient can #become: it to anything he/she likes. This is reality.
I'm not talking about security, only unconsistent side effect.
Klaus D. Witzel wrote:
What I see is that the String I give to the new Stream is modified. Then at a moment, the String does not reflect the stream anymore. This doesn't sound coherent to me.
It is not coherent because you passed an explicitly written *constant* which, in other languages, is believed to be immutable.
And if you all agree to the current behavior, then a documentation should be written: "Don't use the collection after having created a stream on it !"
Easier: don't pass constant collections to the streamers :)
It's not about constant collections at all here.
myCollection := String new: 3. myStream := WriteStream on: myCollection. myStream nextPutAll: 'abcd' copy.
Here, myCollection is left untouched which sounds normal (it's still an empty string of size 3). Now, lets replace #nextPutAll: by 4 #nextPut:
myCollection := String new: 3. myStream := WriteStream on: myCollection. myStream nextPut: $a; nextPut: $b; nextPut: $c; nextPut: $d.
This should have exactly the same behavior... however myCollection now equals 'abc' !!! Why the first 3 characters ? Why not everything or nothing at all ? This is why I think it's not coherent.
I've read the source code and I understand why it happens but I don't think it's coherent. And this as nothing to do with literals nor with immutability. This is a completely different problem (this is why I changed the thread title).
I've written unit-tests for the current behavior. I do not agree with what we have currently, but this is not going to change so:
============== testStreamUseGivenCollection "self debug: #testStreamUseGivenCollection"
"When a stream is created on a collection, it tries to keep using that collection instead of copying. See thread with title 'Very strange bug on Streams and probably compiler' (Feb 14 2007) on the squeak-dev mailing list." |string stream|
string := String withAll: 'erased'. stream := WriteStream on: string. self assert: string = 'erased'.
stream nextPutAll: 'test'. self assert: string = 'tested'. "Begining of 'erased' has been replaced by 'test'". ==============
============== testNextPutAllDifferentFromNextPuts "self debug: #testNextPutAllDifferentFromNextPuts"
"When a stream is created on a collection, it tries to keep using that collection instead of copying. See thread with title 'Very strange bug on Streams and probably compiler' (Feb 14 2007) on the squeak-dev mailing list."
"nextPutAll verifies the size of the parameter and directly grows the underlying collection of the required size."
|string stream| string := String withAll: 'z'. stream := WriteStream on: string. stream nextPutAll: 'abc'. self assert: string = 'z'. "string hasn't been modified because #nextPutAll: detects that 'abc' is bigger than the underlying collection. Thus, it starts by creating a new collection and doesn't modify our variable." string := String withAll: 'z'. stream := WriteStream on: string. stream nextPut: $a; nextPut: $b; nextPut: $c. self assert: string = 'a'. "The first #nextPut: has no problem and replaces $z by $a in the string. Others will detect that string is too small." ==============
I've written unit tests for this compiler "optimizations". I think they show a bug. See attached file.
Hi Damien,
I still think that something is not correct with your analysis. In you comments you write:
"Current compiler uses only one variable for both strings. I think this is a bug."
But you do not code a variable, instead you code *literals* which, as was mentioned earlier, are also not a constant.
?
/Klaus
On Sat, 24 Feb 2007 19:14:52 +0100, Damien Cassou wrote:
I've written unit tests for this compiler "optimizations". I think they show a bug. See attached file.
Ok, I may not use the right word. What comment would you write ?
2007/2/24, Klaus D. Witzel klaus.witzel@cobss.com:
Hi Damien,
I still think that something is not correct with your analysis. In you comments you write:
"Current compiler uses only one variable for both strings. I think this
is a bug."
But you do not code a variable, instead you code *literals* which, as was mentioned earlier, are also not a constant.
?
/Klaus
On Sat, 24 Feb 2007 19:14:52 +0100, Damien Cassou wrote:
I've written unit tests for this compiler "optimizations". I think they show a bug. See attached file.
Hi Damien,
on Sat, 24 Feb 2007 23:29:18 +0100, you wrote:
Ok, I may not use the right word. What comment would you write ?
I cannot write you that comment. Smalltalk was built with a minimum set of unchangeable parts, see "Design Principles Behind Smalltalk" just the sentence after "Good Design"
- http://users.ipa.net/~dwighth/smalltalk/byte_aug81/design_principles_behind_...
Moreover, the design principle does not say minimal, it says minimum.
It turned out the only unchangeable parts are instances of SmallInteger (they are their own oop).
Every object (but the SmallIntegers) can be changed and I disagree with your blaming of the compiler and streams.
/Klaus
2007/2/24, Klaus D. Witzel klaus.witzel@cobss.com:
Hi Damien,
I still think that something is not correct with your analysis. In you comments you write:
"Current compiler uses only one variable for both strings. I
think this is a bug."
But you do not code a variable, instead you code *literals* which, as was mentioned earlier, are also not a constant.
?
/Klaus
On Sat, 24 Feb 2007 19:14:52 +0100, Damien Cassou wrote:
I've written unit tests for this compiler "optimizations". I think they show a bug. See attached file.
FWIW, ObjectiveC solves this by making literals special subclasses of NSString because they reference data compiled into the static data segment of the executable. Also NSString's are immutable, subclasses NSMutableString are mutable. So modification of literals doesn't come up as it produces a DNU type error.
-Todd Blanchard
On Feb 25, 2007, at 2:26 AM, Klaus D. Witzel wrote:
Hi Damien,
on Sat, 24 Feb 2007 23:29:18 +0100, you wrote:
Ok, I may not use the right word. What comment would you write ?
I cannot write you that comment. Smalltalk was built with a minimum set of unchangeable parts, see "Design Principles Behind Smalltalk" just the sentence after "Good Design"
design_principles_behind_smalltalk.html
Moreover, the design principle does not say minimal, it says minimum.
It turned out the only unchangeable parts are instances of SmallInteger (they are their own oop).
Every object (but the SmallIntegers) can be changed and I disagree with your blaming of the compiler and streams.
/Klaus
2007/2/24, Klaus D. Witzel klaus.witzel@cobss.com:
Hi Damien,
I still think that something is not correct with your analysis. In you comments you write:
"Current compiler uses only one variable for both
strings. I think this is a bug."
But you do not code a variable, instead you code *literals* which, as was mentioned earlier, are also not a constant.
?
/Klaus
On Sat, 24 Feb 2007 19:14:52 +0100, Damien Cassou wrote:
I've written unit tests for this compiler "optimizations". I think they show a bug. See attached file.
Thank you Todd,
I meant the same when mentioning literals made with read-only associations earlier.
But Damien is also concerned about passing a subinstance of ArrayedCollection (no other type works at the moment) to a WriteStream, which, on size overflow, allocates a suitable sized new subinstance ... So this also means, it does that for strings.
/Klaus
On Sun, 25 Feb 2007 12:07:54 +0100, you wrote:
FWIW, ObjectiveC solves this by making literals special subclasses of NSString because they reference data compiled into the static data segment of the executable. Also NSString's are immutable, subclasses NSMutableString are mutable. So modification of literals doesn't come up as it produces a DNU type error.
-Todd Blanchard
On Feb 25, 2007, at 2:26 AM, Klaus D. Witzel wrote:
Hi Damien,
on Sat, 24 Feb 2007 23:29:18 +0100, you wrote:
Ok, I may not use the right word. What comment would you write ?
I cannot write you that comment. Smalltalk was built with a minimum set of unchangeable parts, see "Design Principles Behind Smalltalk" just the sentence after "Good Design"
design_principles_behind_smalltalk.html
Moreover, the design principle does not say minimal, it says minimum.
It turned out the only unchangeable parts are instances of SmallInteger (they are their own oop).
Every object (but the SmallIntegers) can be changed and I disagree with your blaming of the compiler and streams.
/Klaus
2007/2/24, Klaus D. Witzel klaus.witzel@cobss.com:
Hi Damien,
I still think that something is not correct with your analysis. In you comments you write:
"Current compiler uses only one variable for both strings. I
think this is a bug."
But you do not code a variable, instead you code *literals* which, as was mentioned earlier, are also not a constant.
?
/Klaus
On Sat, 24 Feb 2007 19:14:52 +0100, Damien Cassou wrote:
I've written unit tests for this compiler "optimizations". I think they show a bug. See attached file.
squeak-dev@lists.squeakfoundation.org