On 30 Oct 2011, at 20:42, Nicolas Cellier wrote:
2011/10/30 Sven Van Caekenberghe sven@beta9.be:
Nicolas,
On 30 Oct 2011, at 17:07, Nicolas Cellier wrote:
There is a huge mess in #readFrom: implementation. Some classes will signal trailing characters as a bug, some other won't and will simply leave the stream positioned after the valid part. I propose to change this behaviour uniformly:
- readFrom: aStream will never fail on trailing chars (hey, it's a
stream, it's up to sender to interpret the tail)
- readFromString: aString will always forbid trailing char (it's not a
stream, so this garbage is most probably an error and cannot be ignored silently)
What do you think ?
Will that not break the idiom (lazy implementation)
readFromString: string ^ self readFrom: string readStream
?
That's it, I would change for something like
readFromString: aString | aStream newInstance | aStream := aString readString. newInstance := self readFrom: aStream. aStream atEnd ifFalse: [FormatError raise]. ^newInstance
Yeah, that is clean and clear as well.
BTW, what I miss in Smalltalk is a way to read from some position to another without all the terrible copying, either from a stream or from a string.
There is ReadStream class>>on:from:to: but I recommend testing and testing again, because Stream implementation is ... not crystal clear.
Thanks for the pointer, Nicolas, apparently I forget what I did myself, earlier this year ;-)
parseTimeStamp2: aString "self parseTimeStamp2: '19670807T110343'." "self parseTimeStamp2: '19670807'. " | year month day hour minute second | year := Integer readFrom: (ReadStream on: aString from: 1 to: 4). month := Integer readFrom: (ReadStream on: aString from: 5 to: 6). day := Integer readFrom: (ReadStream on: aString from: 7 to: 8). aString size > 8 ifTrue: [ hour := Integer readFrom: (ReadStream on: aString from: 10 to: 11). minute := Integer readFrom: (ReadStream on: aString from: 12 to: 13). second := Integer readFrom: (ReadStream on: aString from: 14 to: 15) ] ifFalse: [ hour := minute := second := 0 ]. ^ TimeStamp year: year month: month day: day hour: hour minute: minute second: second offset: Duration zero
Still, this is creating ReadStreams and SqNumberParsers like hell, this seems faster and more efficient:
parseTimeStamp: aString "self parseTimeStamp: '19670807T110343'." "self parseTimeStamp: '19670807'. " | year month day hour minute second stream parser parseInteger | stream := ReadStream on: aString. parser := SqNumberParser on: stream. parseInteger := [ :from :to | stream setFrom: from to: to. parser nextUnsignedIntegerBase: 10 ]. year := parseInteger value: 1 value: 4. month := parseInteger value: 5 value: 6. day := parseInteger value: 7 value: 8. aString size > 8 ifTrue: [ hour := parseInteger value: 10 value: 11. minute := parseInteger value: 12 value: 13. second := parseInteger value: 14 value: 15 ] ifFalse: [ hour := minute := second := 0 ]. ^ TimeStamp year: year month: month day: day hour: hour minute: minute second: second offset: Duration zero
[ Dummy parseTimeStamp: '19670807T110343' ] bench '496,000 per second.' [ Dummy parseTimeStamp2: '19670807T110343' ] bench '298,000 per second.'
Obviously, a little helper class would be prettier.
In certain cases, the readstream+parser could be shared over all conversion calls, but then again, the source of characters would probably be a stream itself, and the the intermediate string should be avoided.
Consider parsing something like this '2011-10-30T17:17:47+01:00', the fields are fixed and pretty simple, but I can't think of an efficient way to do it, can you ?
Don't know... Some pattern matching, maybe with a simple regexp. PEG might be simple too. Qualifying as "efficient" however raise the bar a bit high ;)
I meant converting a known, fixed format as quickly as possible, this is not really a parsing problem, it is just a conversion.
Sven