[Newbies] Decomposing Binary Data by CR/LF - Solved (Sort of)

John-Reed Maffeo jrmaffeo at gmail.com
Mon Sep 4 18:01:04 UTC 2017


It turns out that there is no easy answer to my question. There is no way
to chunk a binary file using line ending characters because of the inherent
nature of binary data which can contain the line end hex values as part of
the stream of arbitrary stream. PDF files contain an xref table to all of
the objects in the file. I have managed to create classes in my framework
which will extract that data into a usable object which I can use to
extract the data. Using "self findTokens: ( Character cr  asString,
 Character lf asString)" is useful in the  areas of a PDF file which do not
contain binary data, and is necessary because the line end values used in a
PDF are dependent on the default values of the operating system the file
was created on,

Thanks again for your interest in my question.

Jrm

On Tue, Jul 25, 2017 at 3:00 AM, John-Reed Maffeo <jrmaffeo at gmail.com>
wrote:

> Is there an existing method that will tokenize/chunk(?) data from a file
> using  CR/LF? The use case is to decompose a file into PDF objects defined
> as strings are strings terminated by CR/LF. (if there is an existing
> framework/project available, I have not found it, just dead ends :-(
>
> I have been exploring in #String and #ByteString and this is all I have
> found that is close to what I need.
>
> "Finds first occurance of #Sting"
> self findString: ( Character cr  asString,  Character lf asString).
> "Breaks at either token value"
> self findTokens: ( Character cr  asString,  Character lf asString)
>
> I have tried poking around in #MultiByteFileStream, but  keep running into
> errors.
>
> If there is no existing method, any suggestions how to write a new one? My
> naive approach is to scan for CR and then peek for LF keeping track of my
> pointers and using them to identify the CR/LF delimited substrings; or
> iterate through contents using #findString:
>
> TIA, jrm
>
> -----
> Image
> -----
> C:\Smalltalk\Squeak5.1-16549-64bit-201608180858-Windows\
> Squeak5.1-16549-64bit-201608180858-Windows\Squeak5.1-16549-64bit.1.image
> Squeak5.1
> latest update: #16549
> Current Change Set: PDFPlayground
> Image format 68021 (64 bit)
>
> Operating System Details
> ------------------------
> Operating System: Windows 7 Professional (Build 7601 Service Pack 1)
> Registered Owner: T530
> Registered Company:
> SP major version: 1
> SP minor version: 0
> Suite mask: 100
> Product type: 1
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/beginners/attachments/20170905/7c791fe5/attachment.html>


More information about the Beginners mailing list