[squeak-dev] readFrom: vs readFromString:

Sven Van Caekenberghe sven at beta9.be
Mon Oct 31 14:33:35 UTC 2011


On 30 Oct 2011, at 20:42, Nicolas Cellier wrote:

> 2011/10/30 Sven Van Caekenberghe <sven at beta9.be>:
>> Nicolas,
>> 
>> On 30 Oct 2011, at 17:07, Nicolas Cellier wrote:
>> 
>>> There is a huge mess in #readFrom: implementation.
>>> Some classes will signal trailing characters as a bug, some other
>>> won't and will simply leave the stream positioned after the valid
>>> part.
>>> I propose to change this behaviour uniformly:
>>> - readFrom: aStream will never fail on trailing chars (hey, it's a
>>> stream, it's up to sender to interpret the tail)
>>> - readFromString: aString will always forbid trailing char (it's not a
>>> stream, so this garbage is most probably an error and cannot be
>>> ignored silently)
>>> 
>>> What do you think ?
>> 
>> Will that not break the idiom (lazy implementation)
>> 
>> readFromString: string
>>        ^ self readFrom: string readStream
>> 
>> ?
> 
> That's it, I would change for something like
> 
> readFromString: aString
>    | aStream newInstance |
>    aStream := aString readString.
>    newInstance := self readFrom: aStream.
>    aStream atEnd ifFalse: [FormatError raise].
>   ^newInstance

Yeah, that is clean and clear as well.

>> 
>> BTW, what I miss in Smalltalk is a way to read from some position to another without all the terrible copying, either from a stream or from a string.
>> 
> 
> There is ReadStream class>>on:from:to: but I recommend testing and
> testing again, because Stream implementation is ... not crystal clear.

Thanks for the pointer, Nicolas, apparently I forget what I did myself, earlier this year ;-)

parseTimeStamp2: aString
	"self parseTimeStamp2: '19670807T110343'."
	"self parseTimeStamp2: '19670807'. "
	
	| year month day hour minute second |
	year := Integer readFrom: (ReadStream on: aString from: 1 to: 4).
	month := Integer readFrom: (ReadStream on: aString from: 5 to: 6).
	day := Integer readFrom: (ReadStream on: aString from: 7 to: 8).
	aString size > 8
		ifTrue: [  
			hour := Integer readFrom: (ReadStream on: aString from: 10 to: 11).
			minute := Integer readFrom: (ReadStream on: aString from: 12 to: 13).
			second := Integer readFrom: (ReadStream on: aString from: 14 to: 15) ]
		ifFalse: [
			hour := minute := second := 0 ].
	^ TimeStamp 
		year: year 
		month: month 
		day: day 
		hour: hour 
		minute: minute 
		second: second 
		offset: Duration zero

Still, this is creating ReadStreams and SqNumberParsers like hell, this seems faster and more efficient:

parseTimeStamp: aString
	"self parseTimeStamp: '19670807T110343'."
	"self parseTimeStamp: '19670807'. "
	
	| year month day hour minute second stream parser parseInteger |
	stream := ReadStream on: aString.
	parser := SqNumberParser on: stream.
	parseInteger := [ :from :to | 
		stream setFrom: from to: to. 
		parser nextUnsignedIntegerBase: 10 ].
	year := parseInteger value: 1 value: 4.
	month := parseInteger value: 5 value: 6.
	day := parseInteger value: 7 value: 8.
	aString size > 8
		ifTrue: [  
			hour := parseInteger value: 10 value: 11.
			minute := parseInteger value: 12 value: 13.
			second := parseInteger value: 14 value: 15 ]
		ifFalse: [
			hour := minute := second := 0 ].
	^ TimeStamp 
		year: year 
		month: month 
		day: day 
		hour: hour 
		minute: minute 
		second: second 
		offset: Duration zero

[ Dummy parseTimeStamp: '19670807T110343' ] bench '496,000 per second.'
[ Dummy parseTimeStamp2: '19670807T110343' ] bench '298,000 per second.'

Obviously, a little helper class would be prettier.

In certain cases, the readstream+parser could be shared over all conversion calls, 
but then again, the source of characters would probably be a stream itself,
and the the intermediate string should be avoided.

>> Consider parsing something like this '2011-10-30T17:17:47+01:00', the fields are fixed and pretty simple, but I can't think of an efficient way to do it, can you ?
>> 
> 
> Don't know... Some pattern matching, maybe with a simple regexp.
> PEG might be simple too.
> Qualifying as "efficient" however raise the bar a bit high ;)

I meant converting a known, fixed format as quickly as possible, this is not really a parsing problem, it is just a conversion.

Sven




More information about the Squeak-dev mailing list