There is a huge mess in #readFrom: implementation. Some classes will signal trailing characters as a bug, some other won't and will simply leave the stream positioned after the valid part. I propose to change this behaviour uniformly: - readFrom: aStream will never fail on trailing chars (hey, it's a stream, it's up to sender to interpret the tail) - readFromString: aString will always forbid trailing char (it's not a stream, so this garbage is most probably an error and cannot be ignored silently)
What do you think ?
Nicolas
Nicolas,
On 30 Oct 2011, at 17:07, Nicolas Cellier wrote:
There is a huge mess in #readFrom: implementation. Some classes will signal trailing characters as a bug, some other won't and will simply leave the stream positioned after the valid part. I propose to change this behaviour uniformly:
- readFrom: aStream will never fail on trailing chars (hey, it's a
stream, it's up to sender to interpret the tail)
- readFromString: aString will always forbid trailing char (it's not a
stream, so this garbage is most probably an error and cannot be ignored silently)
What do you think ?
Will that not break the idiom (lazy implementation)
readFromString: string ^ self readFrom: string readStream
?
BTW, what I miss in Smalltalk is a way to read from some position to another without all the terrible copying, either from a stream or from a string.
Consider parsing something like this '2011-10-30T17:17:47+01:00', the fields are fixed and pretty simple, but I can't think of an efficient way to do it, can you ?
Sven
2011/10/30 Sven Van Caekenberghe sven@beta9.be:
Nicolas,
On 30 Oct 2011, at 17:07, Nicolas Cellier wrote:
There is a huge mess in #readFrom: implementation. Some classes will signal trailing characters as a bug, some other won't and will simply leave the stream positioned after the valid part. I propose to change this behaviour uniformly:
- readFrom: aStream will never fail on trailing chars (hey, it's a
stream, it's up to sender to interpret the tail)
- readFromString: aString will always forbid trailing char (it's not a
stream, so this garbage is most probably an error and cannot be ignored silently)
What do you think ?
Will that not break the idiom (lazy implementation)
readFromString: string ^ self readFrom: string readStream
?
That's it, I would change for something like
readFromString: aString | aStream newInstance | aStream := aString readString. newInstance := self readFrom: aStream. aStream atEnd ifFalse: [FormatError raise]. ^newInstance
BTW, what I miss in Smalltalk is a way to read from some position to another without all the terrible copying, either from a stream or from a string.
There is ReadStream class>>on:from:to: but I recommend testing and testing again, because Stream implementation is ... not crystal clear.
Consider parsing something like this '2011-10-30T17:17:47+01:00', the fields are fixed and pretty simple, but I can't think of an efficient way to do it, can you ?
Don't know... Some pattern matching, maybe with a simple regexp. PEG might be simple too. Qualifying as "efficient" however raise the bar a bit high ;)
Nicolas
Sven
On 30 Oct 2011, at 20:42, Nicolas Cellier wrote:
2011/10/30 Sven Van Caekenberghe sven@beta9.be:
Nicolas,
On 30 Oct 2011, at 17:07, Nicolas Cellier wrote:
There is a huge mess in #readFrom: implementation. Some classes will signal trailing characters as a bug, some other won't and will simply leave the stream positioned after the valid part. I propose to change this behaviour uniformly:
- readFrom: aStream will never fail on trailing chars (hey, it's a
stream, it's up to sender to interpret the tail)
- readFromString: aString will always forbid trailing char (it's not a
stream, so this garbage is most probably an error and cannot be ignored silently)
What do you think ?
Will that not break the idiom (lazy implementation)
readFromString: string ^ self readFrom: string readStream
?
That's it, I would change for something like
readFromString: aString | aStream newInstance | aStream := aString readString. newInstance := self readFrom: aStream. aStream atEnd ifFalse: [FormatError raise]. ^newInstance
Yeah, that is clean and clear as well.
BTW, what I miss in Smalltalk is a way to read from some position to another without all the terrible copying, either from a stream or from a string.
There is ReadStream class>>on:from:to: but I recommend testing and testing again, because Stream implementation is ... not crystal clear.
Thanks for the pointer, Nicolas, apparently I forget what I did myself, earlier this year ;-)
parseTimeStamp2: aString "self parseTimeStamp2: '19670807T110343'." "self parseTimeStamp2: '19670807'. " | year month day hour minute second | year := Integer readFrom: (ReadStream on: aString from: 1 to: 4). month := Integer readFrom: (ReadStream on: aString from: 5 to: 6). day := Integer readFrom: (ReadStream on: aString from: 7 to: 8). aString size > 8 ifTrue: [ hour := Integer readFrom: (ReadStream on: aString from: 10 to: 11). minute := Integer readFrom: (ReadStream on: aString from: 12 to: 13). second := Integer readFrom: (ReadStream on: aString from: 14 to: 15) ] ifFalse: [ hour := minute := second := 0 ]. ^ TimeStamp year: year month: month day: day hour: hour minute: minute second: second offset: Duration zero
Still, this is creating ReadStreams and SqNumberParsers like hell, this seems faster and more efficient:
parseTimeStamp: aString "self parseTimeStamp: '19670807T110343'." "self parseTimeStamp: '19670807'. " | year month day hour minute second stream parser parseInteger | stream := ReadStream on: aString. parser := SqNumberParser on: stream. parseInteger := [ :from :to | stream setFrom: from to: to. parser nextUnsignedIntegerBase: 10 ]. year := parseInteger value: 1 value: 4. month := parseInteger value: 5 value: 6. day := parseInteger value: 7 value: 8. aString size > 8 ifTrue: [ hour := parseInteger value: 10 value: 11. minute := parseInteger value: 12 value: 13. second := parseInteger value: 14 value: 15 ] ifFalse: [ hour := minute := second := 0 ]. ^ TimeStamp year: year month: month day: day hour: hour minute: minute second: second offset: Duration zero
[ Dummy parseTimeStamp: '19670807T110343' ] bench '496,000 per second.' [ Dummy parseTimeStamp2: '19670807T110343' ] bench '298,000 per second.'
Obviously, a little helper class would be prettier.
In certain cases, the readstream+parser could be shared over all conversion calls, but then again, the source of characters would probably be a stream itself, and the the intermediate string should be avoided.
Consider parsing something like this '2011-10-30T17:17:47+01:00', the fields are fixed and pretty simple, but I can't think of an efficient way to do it, can you ?
Don't know... Some pattern matching, maybe with a simple regexp. PEG might be simple too. Qualifying as "efficient" however raise the bar a bit high ;)
I meant converting a known, fixed format as quickly as possible, this is not really a parsing problem, it is just a conversion.
Sven
Nicolas
We are interested in the rationalization of the code you propose. Send it and we will integrate it. May be this is because of Bach music but I like regularly in code :)
Stef
On Oct 30, 2011, at 8:42 PM, Nicolas Cellier wrote:
2011/10/30 Sven Van Caekenberghe sven@beta9.be:
Nicolas,
On 30 Oct 2011, at 17:07, Nicolas Cellier wrote:
There is a huge mess in #readFrom: implementation. Some classes will signal trailing characters as a bug, some other won't and will simply leave the stream positioned after the valid part. I propose to change this behaviour uniformly:
- readFrom: aStream will never fail on trailing chars (hey, it's a
stream, it's up to sender to interpret the tail)
- readFromString: aString will always forbid trailing char (it's not a
stream, so this garbage is most probably an error and cannot be ignored silently)
What do you think ?
Will that not break the idiom (lazy implementation)
readFromString: string ^ self readFrom: string readStream
?
That's it, I would change for something like
readFromString: aString | aStream newInstance | aStream := aString readString. newInstance := self readFrom: aStream. aStream atEnd ifFalse: [FormatError raise]. ^newInstance
BTW, what I miss in Smalltalk is a way to read from some position to another without all the terrible copying, either from a stream or from a string.
There is ReadStream class>>on:from:to: but I recommend testing and testing again, because Stream implementation is ... not crystal clear.
Consider parsing something like this '2011-10-30T17:17:47+01:00', the fields are fixed and pretty simple, but I can't think of an efficient way to do it, can you ?
Don't know... Some pattern matching, maybe with a simple regexp. PEG might be simple too. Qualifying as "efficient" however raise the bar a bit high ;)
Nicolas
Sven
That's it, I would change for something like
readFromString: aString | aStream newInstance | aStream := aString readString. newInstance := self readFrom: aStream. aStream atEnd ifFalse: [FormatError raise]. ^newInstance
I would propose to break this in two methods:
readFromString: aString onLeftOver: aBlock | aStream newInstance | aStream := aString readString. newInstance := self readFrom: aStream. aStream atEnd ifFalse: [aBlock value]. ^newInstance
and
readFromString: aString
^ readFromString: aString onLeftOver: [FormatError raise]
This way we can handle cases where we do not care about trailing stuff, without having to modify the string (which could be expensive)
Stef
On Sun, Oct 30, 2011 at 05:07:48PM +0100, Nicolas Cellier wrote:
There is a huge mess in #readFrom: implementation. Some classes will signal trailing characters as a bug, some other won't and will simply leave the stream positioned after the valid part. I propose to change this behaviour uniformly:
- readFrom: aStream will never fail on trailing chars (hey, it's a
stream, it's up to sender to interpret the tail)
- readFromString: aString will always forbid trailing char (it's not a
stream, so this garbage is most probably an error and cannot be ignored silently)
What do you think ?
This sounds right to me. I suspect that it might be difficult to do this without breaking some external packages, but it does seem like the right thing to do.
Dave
squeak-dev@lists.squeakfoundation.org