[Newbies] Decomposing Binary Data by CR/LF

John-Reed Maffeo jrmaffeo at gmail.com
Tue Jul 25 17:31:07 UTC 2017


Chris, Lou,
Thanks. After more research on the web, I think I need to rethink my
approach to the problem.  "" PDF's are actually designed to be read
"backwards" starting at the end. ""  My question is still valid  and I am
working on a solution. Will post something if it is useful.

-jrm

On Mon, Jul 24, 2017 at 5:25 PM, Chris Cunningham <cunningham.cb at gmail.com>
wrote:

> Hi JRM,
>
> I think MultiByteFileStream is where you want to work on this.  Since you
> said it is, specifically, a file that has Cr/Lf line endings, then this is
> the place.
>
> There are tricks to making it work, which aren't clearly documented
> (unfortunately).
>
> This looks like how the MultiByteFileStream is supposed to work:
>
> 1. Open the file.
> 2. Send
>           #wantsLineEndConversoin: true
>     to the file.
> 3. Send #ascii to the file (to tell it is a text file, and to determine
> the Cr/Lf or Cr or Lf encoding)
> 4. Read data from file. It should convert Cr/Lf to just Cr, and all things
> are happy.
>
> Except if you send something like #next: 20, and the last character isn't
> a #Cr, then it looks like it would be buggy.
> But, please try this and see if it works.  If so, please let me know.
>
> An alternative seems to be that you could just open it without any of
> those changes, and go through the file line by line (sending #nextLine to
> the file), and the implementation of #nextLine in PositionableStream should
> also take care of the Cr/Lf issues.
>
> If you try this route, please let me know how it goes as well.
>
> Thanks,
> cbc
>
>
> On Mon, Jul 24, 2017 at 11:00 AM, John-Reed Maffeo <jrmaffeo at gmail.com>
> wrote:
>
>> Is there an existing method that will tokenize/chunk(?) data from a file
>> using  CR/LF? The use case is to decompose a file into PDF objects defined
>> as strings are strings terminated by CR/LF. (if there is an existing
>> framework/project available, I have not found it, just dead ends :-(
>>
>> I have been exploring in #String and #ByteString and this is all I have
>> found that is close to what I need.
>>
>> "Finds first occurance of #Sting"
>> self findString: ( Character cr  asString,  Character lf asString).
>> "Breaks at either token value"
>> self findTokens: ( Character cr  asString,  Character lf asString)
>>
>> I have tried poking around in #MultiByteFileStream, but  keep running
>> into errors.
>>
>> If there is no existing method, any suggestions how to write a new one?
>> My naive approach is to scan for CR and then peek for LF keeping track of
>> my pointers and using them to identify the CR/LF delimited substrings; or
>> iterate through contents using #findString:
>>
>> TIA, jrm
>>
>> -----
>> Image
>> -----
>> C:\Smalltalk\Squeak5.1-16549-64bit-201608180858-Windows\Sque
>> ak5.1-16549-64bit-201608180858-Windows\Squeak5.1-16549-64bit.1.image
>> Squeak5.1
>> latest update: #16549
>> Current Change Set: PDFPlayground
>> Image format 68021 (64 bit)
>>
>> Operating System Details
>> ------------------------
>> Operating System: Windows 7 Professional (Build 7601 Service Pack 1)
>> Registered Owner: T530
>> Registered Company:
>> SP major version: 1
>> SP minor version: 0
>> Suite mask: 100
>> Product type: 1
>>
>>
>> _______________________________________________
>> Beginners mailing list
>> Beginners at lists.squeakfoundation.org
>> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>>
>>
>
> _______________________________________________
> Beginners mailing list
> Beginners at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/beginners/attachments/20170725/27f1e36c/attachment.html>


More information about the Beginners mailing list