It turns out that there is no easy answer to my question. There is no way to chunk a binary file using line ending characters because of the inherent nature of binary data which can contain the line end hex values as part of the stream of arbitrary stream. PDF files contain an xref table to all of the objects in the file. I have managed to create classes in my framework which will extract that data into a usable object which I can use to extract the data. Using "self findTokens: ( Character cr asString, Character lf asString)" is useful in the areas of a PDF file which do not contain binary data, and is necessary because the line end values used in a PDF are dependent on the default values of the operating system the file was created on,
Thanks again for your interest in my question.
Jrm
On Tue, Jul 25, 2017 at 3:00 AM, John-Reed Maffeo jrmaffeo@gmail.com wrote:
Is there an existing method that will tokenize/chunk(?) data from a file using CR/LF? The use case is to decompose a file into PDF objects defined as strings are strings terminated by CR/LF. (if there is an existing framework/project available, I have not found it, just dead ends :-(
I have been exploring in #String and #ByteString and this is all I have found that is close to what I need.
"Finds first occurance of #Sting" self findString: ( Character cr asString, Character lf asString). "Breaks at either token value" self findTokens: ( Character cr asString, Character lf asString)
I have tried poking around in #MultiByteFileStream, but keep running into errors.
If there is no existing method, any suggestions how to write a new one? My naive approach is to scan for CR and then peek for LF keeping track of my pointers and using them to identify the CR/LF delimited substrings; or iterate through contents using #findString:
TIA, jrm
Image
C:\Smalltalk\Squeak5.1-16549-64bit-201608180858-Windows\ Squeak5.1-16549-64bit-201608180858-Windows\Squeak5.1-16549-64bit.1.image Squeak5.1 latest update: #16549 Current Change Set: PDFPlayground Image format 68021 (64 bit)
Operating System Details
Operating System: Windows 7 Professional (Build 7601 Service Pack 1) Registered Owner: T530 Registered Company: SP major version: 1 SP minor version: 0 Suite mask: 100 Product type: 1
beginners@lists.squeakfoundation.org