String hierarchy
Marcel Weiher
marcel at metaobject.com
Mon Mar 20 22:13:53 UTC 2000
> From: "Andrew C. Greenberg" <werdna at gate.net>
> I am not certain I agree that mutable strings has the advantage of
> efficient use of memory. With mutable strings, the substring and
> slice operations require a copy. With immutable strings, it might
> suffice to generate a Decorator with offset and size data. Of
> course, much depends upon the particular applications, as keeping a
> pointer to a large string can be inefficient if all that needs to
> remain is the slice.
>
> How are these operations implemented in NextStep?
This is not generally known as source code is not available to the
public and with the class-cluster concept, even most of the
sub-classes are actually private (though class-dump brings those to
light).
First of all, there is a distinction between "byte oriented data"
and "strings", with the formed being NSData ( ByteArray?) and the
latter NSString (String). There are subclasses that reference parts
of an NSData object, though I am pretty sure that is only used for
immutable data.
NSData also uses Mach's VM-copy facility to virtually copy large
amounts of data with write access. What happens is that the same
physical memory is mapped into a different region and both the
original and the copy marked copy-on-write, so any writes to the
memory will result in a real copy being made. This is great if you
have very large data sets that you *might* need to modify. Making
real copies "just in case" can be very (prohibitively) expensive,
while keeping track of diffs makes code very complicated. There is
an internal cutoff where a real copy is done instead. Of course,
both of these are implementation details that are hidden from
clients, who just see that a copy has been made.
For string operations, I think that usually a copy is made without
much in terms of optimization, the reasoning being that actual
strings tend to be somewhat shorter than raw binary data (heck, Moby
Dick is only about a megabyte, IIRC). Another issue to keep track of
with a decorator sub-string is that it is an extra indirection. One
level may not seem expensive, but what happens if you get a
substring of a substring of a substring... ? [ A sub-data class I
wrote for Objective-C keeps a pointer straight into the original
data, don't know how that would work with Smalltalk ]
One thing that is optimized for immutable objects is the #copy
method. It just returns the immutable object itself. There is also
a #mutableCopy message that, well, returns a mutable (usually
somewhat deep) copy of the receiver.
Marcel
More information about the Squeak-dev
mailing list
|