FileStream and TextConverters etc (reposting from another thread)

Yoshiki Ohshima yoshiki at squeakland.org
Tue Apr 11 08:31:39 UTC 2006


  Göran,

> Btw, yesterday I was staring at the MultiByteFileStream stuff and...
> well, IMHO it would have been better *for me* (other users may have
> other stories to tell) if the default was binary and not ascii. The
> principle of least surprise. If I open a filestream and don't tell it
> *anything*, then I would expect it to just feed me the bits and bytes -
> as Strings or ByteArrays, but not doing any conversions or line end
> mumbo jumbo or any other non expected "nice things". An example of this
> is inspecting a file in the file list - I really appreciated the fact
> that filelist didn't do *any* conversion on the stuff it showed me - now
> it does. And I also wonder where the hex view went... anyway:

  Again, "Strings" now include WideStrings, so "no conversion" would not
work for the users of such strings.

> What I ended up doing was creating NullTextConverter (which does no
> conversion at all, trivial to write) and then it worked fine.

  Sorry about that, but we actually have it.... It is just called
Latin1TextConverter.  (There was some argument for intentional
revealing names and we were almost about to add a empty subclass of
Latin1TextConverter, but we didn't get around it.)

> It seems
> to me that a
> cleaner approach here would be to:
> 
> 1. Do line end conversions or not regardless of the 2 choices below..
> 2. Binary or ascii - only decides if we use ByteArrays or Strings,
> doesn't concern conversions or line ends.
> 3. Selection of converter where we also have a NullConverter that does
> nothing.
> 
> IMHO (having not dissected this in total detail) the above three options
> should be combinable. So for example, in our case we have utf8 strings
> that we want to write out *as is* and use #cr to get platform specific
> line endings.

  Mostly I agree, as we do have almost independent choice of 1. and
2., as well as NullConverter under the name of Latin1TextConverter.

  But, isn't the combination of #binary and a line end conversion confusing?

> I also think that a default FileStream should not do any line end
> conversions or conversions at all by default (but still use Strings
> instead of ByteArrays). In other words - I would like the "least
> surprise" principle to hold. Am I alone in this idea? I love the work of
> Yoshiki and friends in this area - I just want to iron out the small
> "gotchas" with it.
>
> Now, Yoshiki and all the rest of you - feel free to correct me with the
> real facts. :)

  I wrote a reply to you on this regard last week.  For the least
surprise principle, I would say using UTF8 conversion for text would
make sense.

  And, as Andreas wrote, the best thing is to separate the concerns.
If somebody manages to separate the fileOut and fileIn aspect from
FileStream (there were discussions to move to XML-based external
format...), it would be a great advance in that front.

-- Yoshiki



More information about the Squeak-dev mailing list