[Newbies] Decomposing Binary Data by CR/LF

Chris Cunningham cunningham.cb at gmail.com
Mon Jul 24 20:55:44 UTC 2017


Hi JRM,

I think MultiByteFileStream is where you want to work on this.  Since you
said it is, specifically, a file that has Cr/Lf line endings, then this is
the place.

There are tricks to making it work, which aren't clearly documented
(unfortunately).

This looks like how the MultiByteFileStream is supposed to work:

1. Open the file.
2. Send
          #wantsLineEndConversoin: true
    to the file.
3. Send #ascii to the file (to tell it is a text file, and to determine the
Cr/Lf or Cr or Lf encoding)
4. Read data from file. It should convert Cr/Lf to just Cr, and all things
are happy.

Except if you send something like #next: 20, and the last character isn't a
#Cr, then it looks like it would be buggy.
But, please try this and see if it works.  If so, please let me know.

An alternative seems to be that you could just open it without any of those
changes, and go through the file line by line (sending #nextLine to the
file), and the implementation of #nextLine in PositionableStream should
also take care of the Cr/Lf issues.

If you try this route, please let me know how it goes as well.

Thanks,
cbc


On Mon, Jul 24, 2017 at 11:00 AM, John-Reed Maffeo <jrmaffeo at gmail.com>
wrote:

> Is there an existing method that will tokenize/chunk(?) data from a file
> using  CR/LF? The use case is to decompose a file into PDF objects defined
> as strings are strings terminated by CR/LF. (if there is an existing
> framework/project available, I have not found it, just dead ends :-(
>
> I have been exploring in #String and #ByteString and this is all I have
> found that is close to what I need.
>
> "Finds first occurance of #Sting"
> self findString: ( Character cr  asString,  Character lf asString).
> "Breaks at either token value"
> self findTokens: ( Character cr  asString,  Character lf asString)
>
> I have tried poking around in #MultiByteFileStream, but  keep running into
> errors.
>
> If there is no existing method, any suggestions how to write a new one? My
> naive approach is to scan for CR and then peek for LF keeping track of my
> pointers and using them to identify the CR/LF delimited substrings; or
> iterate through contents using #findString:
>
> TIA, jrm
>
> -----
> Image
> -----
> C:\Smalltalk\Squeak5.1-16549-64bit-201608180858-Windows\
> Squeak5.1-16549-64bit-201608180858-Windows\Squeak5.1-16549-64bit.1.image
> Squeak5.1
> latest update: #16549
> Current Change Set: PDFPlayground
> Image format 68021 (64 bit)
>
> Operating System Details
> ------------------------
> Operating System: Windows 7 Professional (Build 7601 Service Pack 1)
> Registered Owner: T530
> Registered Company:
> SP major version: 1
> SP minor version: 0
> Suite mask: 100
> Product type: 1
>
>
> _______________________________________________
> Beginners mailing list
> Beginners at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/beginners/attachments/20170724/7de81bc6/attachment.html>


More information about the Beginners mailing list