[squeak-dev] String & Text

Mon Jul 18 07:10:55 UTC 2022

Hi all --

The existence of Text shows a need for the composite of string+attributes. The current use of Text may look specific to graphics and output. Yes. The underlying benefits of having a generic OO representation for string+attributes is not specific to graphics and output. Also, just because there are text attributes that hold on the arbitrary objects instead of primitive values does not imply that all text attributes have to.

A CharacterCollection could help extract the common things from String and Text. Yet, I am not so sure that we can afford an extra "self string" in all string operations just because ByteString/WideString happen to be their own #string. Hmm.... Maybe it is not so easy after all. Or maybe it is just a single operation that needs to be refined in String to bypass that #string lookup.

Here is an excerpt from VisualWorks:

Best,
Marcel
Am 18.07.2022 03:39:27 schrieb Chris Muller <asqueaker at gmail.com>:
Text is unique in the image from the aspect of it being the only class that is simultaneously a byte and pointer object.  Thanks to the magic of TextAnchor's, any character of a Text is allowed to be a pointer object.  It's implemented by (Character value: 1) being yet another special substitution performed dynamically during layout.  Since String can't know (Character value: 1) is special, it's hard to predict whether all those categories of behaviors Jakob identified on String would behave properly with Anchor characters, or what the proper behavior should even be, in some cases.

It seems like Text is meant to be considered as an output, and not an input.  Just as we once extruded the domain part of "Smalltalk" from the Image responsibilities (SmalltalkImage) and now encourage client code to access that domain via its #environment, IMO, it's worth client code employing Text similarly.  As much as possible, simply access through the Text domain via the core API like #string, to minimize the API added to Text.  This may already have been the intention, as String already understands #string...

 - Chris

On Sat, Jul 16, 2022 at 5:13 AM Jakob Reschke <jakres+squeak at gmail.com [mailto:jakres%2Bsqueak at gmail.com]> wrote:

I scrolled through the protocols of String and Text two days ago. The main categories of methods (which don't overlap well with the classification in the image) on String that I remember are:
- general string parts enumeration (finding, lines)
- general string manipulation (trimming, splitting, slicing, character substitution, substring substitution, case changes, padding, ...)
- tokenization and parsing (e. g. csv/tsv), some of it Smalltalk language-specific (getters, setters, arguments/keywords count)
   - natural language conversions (e. g. camelCase <-> words)
   - MIME
- formatting (interpolation, line wrapping or joining, indentation)
- collation and (equality) comparison
- pattern matching
- similarity computations
- character classification (digits, letters, whitespace, ...)
- graphics display

- arithmetic with numbers
- multilingual stuff (among others: dealing with leadingChar)
- character set conversions

- VM paths <-> Squeak paths conversion
- HTML and HTTP conversions (e. g. URL %-decoding)

- crc16
- extension methods from various packages like Etoys, JSON, Monticello, Morphic, Regex, Network, ...
   - chronology conversions

Next to "format:" there seems to be yet another interpolation mechanism in "expandMacrosWithArguments:".

Overall, some methods treat strings as text, others treat them as technical data (markup, file paths, URLs, encoding---ByteArrays with characters). The leadingChar stuff is strangely in the middle of the two. Text functionality like interpolation can also be used to produce technical data, of course.

Which selectors Text also understands is at times chaotic. Of some groups of related selectors, Text may just understand one of them. When it comes to Text's "core domain" of displaying, it does not understand any of the displayAt:/displayOn:* selectors...

Am Sa., 16. Juli 2022 um 03:28 Uhr schrieb Chris Muller <ma.chris.m at gmail.com [mailto:ma.chris.m at gmail.com]>:

Hi Marcel,

> > But until we do that, and whole hog like Eliot suggested,
> > what we will have are *some* domain things that String
> > can do that Text can't -- a partial overlap.
>
> It's rather easy. Once we have a CharacterCollection, we can
> finally see the special cases on String. The common stuff can be
> moved up to then benefit both String and Text.

I understand.  CharacterCollection would make it so we "could" do it, but it's still worth scrutinizing heavily first whether we should.

> > In other words, an incomplete mess for an indefinite period of time.
>
> Disagree. This kind of refactoring does not look too difficult.

I was speaking about the state of the system until such time as that refactoring was completed, which is indefinite.  Until then, the API would be, by definition, incomplete.  We can disagree about it being a "mess", though.   :)

> > By removing all the domain stuff [...]
>
> I think that we have a different understand of the term "domain"
> here. Maybe you are worried about Magma. If so, please
> elaborate your concerns from that perspective.

No, I'm speaking strictly in terms of good OO design, where too many responsibilities for a class is considered not good design.

In the other thread, Tim just wrote this about formatting comments.  It stuck out to me as an example relevant to the question of this thread.

(Tim wrote:)
> Oh, I'm not wanting to have any tabs or spaces inserted - I want the formatting to be live and use the left indent. 
> Shout does all that work to colour (etc) the text so why not use the fact that it detects comments.

I agree with him 100%, and this continues to remind us of the numerous responsibilities that can be considered "presentation" only.  Before I had only mentioned fonts, colors, and attributes (bold, italic, center, right justify, left jusify, etc.), but Tim reminds us that indentation and layout is in there, too!

This is already a nice collection of behaviors that completely distinct from the ones concerned with the _contents_ of the Text (e.g., its 'string'), which is what I mean when I refer to the "domain" vs. presentation responsibilities of Text.

Maybe bloating up Text with such a huge API (domain + presentation) MIGHT be the best way out.  I don't know for sure, and I trust this brilliant community will come down on the right choice.  I'm only saying that some of the usual OO design quality metrics (e.g., number of methods per class, among others) will get blown out of the water by this, and that this is a sign that it's really worth being cautious.  It also looks to be a one-way ticket -- eliminating this delineation of responsibility and piling on hundreds of domain accessing / mutating methods onto Text's API will be a lot easier than going the other way.  Once we have 5 years of accumulated dependency on its domain-accessing responsibilities, it would be a lot harder to untangle that in 2027 (in case it became unmanageable) than it was to mash them together in 2022.  I'm not necessarily against mashing them, I just think we should give it some heavy scrutiny first..

> > #format: was introduced to Text in 2019.
>
> And long overdue since at least 2015. ;-P Thanks again,
> Christoph (ct) for adding it! It made GUI programming much
> easier. I had that one in mind for many years now.
>...
> > I don't think updates to Text will or should occur except
> > when driven by specific need.

Christoph chose to add that one, #format:, and not hundreds of others that day.  His decision was based on _something_ which could be considered akin to a "need".  Maybe the lack of need until that "overdue" time expresses that Text, in the least, didn't, in fact, need that responsibility, if not "doesn't", going forward.

In summary, IMO, if there's any way to keep Text's responsibilities separate, it's at least worth considering.

 - Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220718/e55fff22/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 13385 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220718/e55fff22/attachment.png>