New subject: Recent change in byte array at:put:

30 Jul 2017


      Simple "solution" to editing: treat (encoded) Strings as immutable.
For editing/stringbuilding, use a WideString or a special kind of stream
(MultiByteBinaryOrTextStream or how is it called) plus additional support
for inserting in the middle if desired. I remember somebody proposing Ropes
when discussing a reformation of strings previously. Could be interesting
at "edit-time".
For (in-memory) storage, encoded Strings should maybe just be ByteArrays
paired with some TextConverter-like thing or at least a spec of the
encoding so you can fetch Characters or configure streams from it on demand.
If licensing permits it one could also have a look at how the OpenJDK deals
with UTF-16 in StringBuilder.
Am 30.07.2017 03:20 schrieb "tim Rowledge" tim@rowledge.org:
...
On 29-07-2017, at 12:48 PM, Nicolas Cellier <nicolas.cellier.aka.nice@gmai
l.com> wrote:
...
Absolutely,
to me a String is a sequence of characters.
squeakToUtf8 is a hack that makes us consider a String as a sequence of
codePoints whose encoding is in the eye of the beholder (or implicitly in
the Context - the Smalltalk one).
...
I's not very object oriented and quite fragile.
We started to clean Multilingual but never finished the job…
Yes, that’s pretty much how I see it. Currently the utf8 ‘string’ is just
kept as a byte string and the user is expected to understand that it is in
a rather dangerous state.
...
It's difficult to finish it, because we value backward compatibility.
So maybe the ByteArray change was a bit radical with this respect.
Backward compatibility can sometimes drive you to loud swearing!
Maybe a new message to return the bytearray of the uft8 data could be
added, leaving the old one alone. We should probably consider making an
actual UTF8String class, though I did try to work out the best thing to do
for that several years ago for NuScratch and got lost in the tangles.
Editing the damn things is a pain, to say the leat, so you get to thinking
about having the canonical string as an instvar and a byte array and edits
work on the String which gets converted at the end of the edit to update
bytearray. Or the other way around… or… aaargh!
tim
--
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim
Useful random insult:- Immune from any serious head injury.

Re: [squeak-dev] Recent change in byte array at:put:

tim