[squeak-dev] String & Text

Mon Jul 18 01:38:38 UTC 2022

*Text* is unique in the image from the aspect of it being the only class
that is simultaneously a *byte* *and* *pointer* object.  Thanks to the
magic of TextAnchor's, any character of a Text is allowed to be a pointer
object.  It's implemented by (Character value: 1) being yet another special
substitution performed dynamically during layout.  Since String can't know
(Character value: 1) is special, it's hard to predict whether all those
categories of behaviors Jakob identified on String would behave properly
with Anchor characters, or what the proper behavior should even be, in some
cases.

It seems like Text is meant to be considered as an output, and not an
input.  Just as we once extruded the domain part of "Smalltalk" from the
Image responsibilities (SmalltalkImage) and now encourage client code to
access that domain via its #environment, IMO, it's worth client code
employing Text similarly.  As much as possible, simply access *through* the
Text domain via the core API like #string, to minimize the API added to
Text.  This may already have been the intention, as String already
understands #string...

 - Chris

On Sat, Jul 16, 2022 at 5:13 AM Jakob Reschke <jakres+squeak at gmail.com>
wrote:

> I scrolled through the protocols of String and Text two days ago. The main
> categories of methods (which don't overlap well with the classification in
> the image) on String that I remember are:
> - general string parts enumeration (finding, lines)
> - general string manipulation (trimming, splitting, slicing, character
> substitution, substring substitution, case changes, padding, ...)
> - tokenization and parsing (e. g. csv/tsv), some of it Smalltalk
> language-specific (getters, setters, arguments/keywords count)
>    - natural language conversions (e. g. camelCase <-> words)
>    - MIME
> - formatting (interpolation, line wrapping or joining, indentation)
> - collation and (equality) comparison
> - pattern matching
> - similarity computations
> - character classification (digits, letters, whitespace, ...)
> - graphics display
> - arithmetic with numbers
> - multilingual stuff (among others: dealing with leadingChar)
> - character set conversions
> - VM paths <-> Squeak paths conversion
> - HTML and HTTP conversions (e. g. URL %-decoding)
> - crc16
> - extension methods from various packages like Etoys, JSON, Monticello,
> Morphic, Regex, Network, ...
>    - chronology conversions
>
> Next to "format:" there seems to be yet another interpolation mechanism in
> "expandMacrosWithArguments:".
>
> Overall, some methods treat strings as text, others treat them as
> technical data (markup, file paths, URLs, encoding---ByteArrays with
> characters). The leadingChar stuff is strangely in the middle of the two.
> Text functionality like interpolation can also be used to produce technical
> data, of course.
>
> Which selectors Text also understands is at times chaotic. Of some groups
> of related selectors, Text may just understand one of them. When it comes
> to Text's "core domain" of displaying, it does not understand any of the
> displayAt:/displayOn:* selectors...
>
>
>
> Am Sa., 16. Juli 2022 um 03:28 Uhr schrieb Chris Muller <
> ma.chris.m at gmail.com>:
>
>> Hi Marcel,
>>
>> > > But until we do that, and whole hog like Eliot suggested,
>> > > what we will have are *some* domain things that String
>> > > can do that Text can't -- a partial overlap.
>> >
>> > It's rather easy. Once we have a CharacterCollection, we can
>> > finally see the special cases on String. The common stuff can be
>> > moved up to then benefit both String and Text.
>>
>> I understand.  CharacterCollection would make it so we "could" do it, but
>> it's still worth scrutinizing heavily first whether we should.
>>
>> > > In other words, an incomplete mess for an indefinite period of time.
>> >
>> > Disagree. This kind of refactoring does not look too difficult.
>>
>> I was speaking about the state of the system until such time as that
>> refactoring was completed, which is indefinite.  Until then, the API would
>> be, by definition, incomplete.  We can disagree about it being a "mess",
>> though.   :)
>>
>> > > By removing all the domain stuff [...]
>> >
>> > I think that we have a different understand of the term "domain"
>> > here. Maybe you are worried about Magma. If so, please
>> > elaborate your concerns from that perspective.
>>
>> No, I'm speaking strictly in terms of good OO design, where too many
>> responsibilities for a class is considered not good design.
>>
>> In the other thread, Tim just wrote this about formatting comments.  It
>> stuck out to me as an example relevant to the question of this thread.
>>
>> (Tim wrote:)
>> > Oh, I'm not wanting to have any tabs or spaces inserted - I want the
>> formatting to be live and use the left indent.
>> > Shout does all that work to colour (etc) the text so why not use the
>> fact that it detects comments.
>>
>> I agree with him 100%, and this continues to remind us of the numerous
>> responsibilities that can be considered "presentation" only.  Before I had
>> only mentioned fonts, colors, and attributes (bold, italic, center, right
>> justify, left jusify, etc.), but Tim reminds us that indentation and layout
>> is in there, too!
>>
>> This is already a nice collection of behaviors that completely distinct
>> from the ones concerned with the _contents_ of the Text (e.g., its
>> 'string'), which is what I mean when I refer to the "domain" vs.
>> presentation responsibilities of Text.
>>
>> Maybe bloating up Text with such a huge API (domain + presentation) MIGHT
>> be the best way out.  I don't know for sure, and I trust this brilliant
>> community will come down on the right choice.  I'm only saying that some of
>> the usual OO design quality metrics (e.g., number of methods per class,
>> among others) will get blown out of the water by this, and that this is a
>> sign that it's really worth being cautious.  It also looks to be a one-way
>> ticket -- eliminating this delineation of responsibility and piling on
>> hundreds of domain accessing / mutating methods onto Text's API will be a
>> lot easier than going the other way.  Once we have 5 years of accumulated
>> dependency on its domain-accessing responsibilities, it would be a lot
>> harder to untangle that in 2027 (in case it became unmanageable) than it
>> was to mash them together in 2022.  I'm not necessarily against mashing
>> them, I just think we should give it some heavy scrutiny first..
>>
>> > > #format: was introduced to Text in 2019.
>> >
>> > And long overdue since at least 2015. ;-P Thanks again,
>> > Christoph (ct) for adding it! It made GUI programming much
>> > easier. I had that one in mind for many years now.
>> >...
>> > > I don't think updates to Text will or should occur except
>> > > when driven by specific need.
>>
>> Christoph chose to add that one, #format:, and not hundreds of others
>> that day.  His decision was based on _something_ which could be considered
>> akin to a "need".  Maybe the lack of need until that "overdue" time
>> expresses that Text, in the least, *didn't*, in fact, need that
>> responsibility, if not "doesn't", going forward.
>>
>> In summary, IMO, if there's any way to keep Text's responsibilities
>> separate, it's at least worth considering.
>>
>>  - Chris
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220717/83914ed4/attachment.html>