String hierarchy

Marcel Weiher marcel at metaobject.com
Mon Mar 20 22:13:53 UTC 2000


> From: "Andrew C. Greenberg" <werdna at gate.net>
> I am not certain I agree that mutable strings has the advantage of  
> efficient use of memory.  With mutable strings, the substring and  
> slice operations require a copy.  With immutable strings, it might  
> suffice to generate a Decorator with offset and size data.  Of
> course, much depends upon the particular applications, as keeping a  
> pointer to a large string can be inefficient if all that needs to  
> remain is the slice.
>
> How are these operations implemented in NextStep?

This is not generally known as source code is not available to the  
public and with the class-cluster concept, even most of the  
sub-classes are actually private (though class-dump brings those to  
light).

First of all, there is a distinction between "byte oriented data"  
and "strings", with the formed being NSData ( ByteArray?) and the  
latter NSString (String).  There are subclasses that reference parts  
of an NSData object, though I am pretty sure that is only used for  
immutable data.

NSData also uses Mach's VM-copy facility to virtually copy large  
amounts of data with write access.   What happens is that the same  
physical memory is mapped into a different region and both the  
original and the copy marked copy-on-write, so any writes to the  
memory will result in a real copy being made.  This is great if you  
have very large data sets that you *might* need to modify.  Making  
real copies "just in case" can be very (prohibitively) expensive,  
while keeping track of diffs makes code very complicated.   There is  
an internal cutoff where a real copy is done instead.  Of course,  
both of these are implementation details that are hidden from   
clients, who just see that a copy has been made.

For string operations, I think that usually a copy is made without  
much in terms of optimization, the reasoning being that actual  
strings tend to be somewhat shorter than raw binary data (heck, Moby  
Dick is only about a megabyte, IIRC).  Another issue to keep track of  
with a decorator sub-string is that it is an extra indirection.  One  
level may not seem expensive, but what happens if you get a  
substring of a substring of a substring... ?   [ A sub-data class I  
wrote for Objective-C keeps a pointer straight into the original  
data, don't know how that would work with Smalltalk ]

One thing that is optimized for immutable objects is the #copy  
method.  It just returns the immutable object itself.  There is also  
a #mutableCopy message that, well, returns a mutable (usually  
somewhat deep) copy of the receiver.

Marcel





More information about the Squeak-dev mailing list