[squeak-dev] readFrom: vs readFromString:
Sven Van Caekenberghe
sven at beta9.be
Mon Oct 31 14:33:35 UTC 2011
On 30 Oct 2011, at 20:42, Nicolas Cellier wrote:
> 2011/10/30 Sven Van Caekenberghe <sven at beta9.be>:
>> Nicolas,
>>
>> On 30 Oct 2011, at 17:07, Nicolas Cellier wrote:
>>
>>> There is a huge mess in #readFrom: implementation.
>>> Some classes will signal trailing characters as a bug, some other
>>> won't and will simply leave the stream positioned after the valid
>>> part.
>>> I propose to change this behaviour uniformly:
>>> - readFrom: aStream will never fail on trailing chars (hey, it's a
>>> stream, it's up to sender to interpret the tail)
>>> - readFromString: aString will always forbid trailing char (it's not a
>>> stream, so this garbage is most probably an error and cannot be
>>> ignored silently)
>>>
>>> What do you think ?
>>
>> Will that not break the idiom (lazy implementation)
>>
>> readFromString: string
>> ^ self readFrom: string readStream
>>
>> ?
>
> That's it, I would change for something like
>
> readFromString: aString
> | aStream newInstance |
> aStream := aString readString.
> newInstance := self readFrom: aStream.
> aStream atEnd ifFalse: [FormatError raise].
> ^newInstance
Yeah, that is clean and clear as well.
>>
>> BTW, what I miss in Smalltalk is a way to read from some position to another without all the terrible copying, either from a stream or from a string.
>>
>
> There is ReadStream class>>on:from:to: but I recommend testing and
> testing again, because Stream implementation is ... not crystal clear.
Thanks for the pointer, Nicolas, apparently I forget what I did myself, earlier this year ;-)
parseTimeStamp2: aString
"self parseTimeStamp2: '19670807T110343'."
"self parseTimeStamp2: '19670807'. "
| year month day hour minute second |
year := Integer readFrom: (ReadStream on: aString from: 1 to: 4).
month := Integer readFrom: (ReadStream on: aString from: 5 to: 6).
day := Integer readFrom: (ReadStream on: aString from: 7 to: 8).
aString size > 8
ifTrue: [
hour := Integer readFrom: (ReadStream on: aString from: 10 to: 11).
minute := Integer readFrom: (ReadStream on: aString from: 12 to: 13).
second := Integer readFrom: (ReadStream on: aString from: 14 to: 15) ]
ifFalse: [
hour := minute := second := 0 ].
^ TimeStamp
year: year
month: month
day: day
hour: hour
minute: minute
second: second
offset: Duration zero
Still, this is creating ReadStreams and SqNumberParsers like hell, this seems faster and more efficient:
parseTimeStamp: aString
"self parseTimeStamp: '19670807T110343'."
"self parseTimeStamp: '19670807'. "
| year month day hour minute second stream parser parseInteger |
stream := ReadStream on: aString.
parser := SqNumberParser on: stream.
parseInteger := [ :from :to |
stream setFrom: from to: to.
parser nextUnsignedIntegerBase: 10 ].
year := parseInteger value: 1 value: 4.
month := parseInteger value: 5 value: 6.
day := parseInteger value: 7 value: 8.
aString size > 8
ifTrue: [
hour := parseInteger value: 10 value: 11.
minute := parseInteger value: 12 value: 13.
second := parseInteger value: 14 value: 15 ]
ifFalse: [
hour := minute := second := 0 ].
^ TimeStamp
year: year
month: month
day: day
hour: hour
minute: minute
second: second
offset: Duration zero
[ Dummy parseTimeStamp: '19670807T110343' ] bench '496,000 per second.'
[ Dummy parseTimeStamp2: '19670807T110343' ] bench '298,000 per second.'
Obviously, a little helper class would be prettier.
In certain cases, the readstream+parser could be shared over all conversion calls,
but then again, the source of characters would probably be a stream itself,
and the the intermediate string should be avoided.
>> Consider parsing something like this '2011-10-30T17:17:47+01:00', the fields are fixed and pretty simple, but I can't think of an efficient way to do it, can you ?
>>
>
> Don't know... Some pattern matching, maybe with a simple regexp.
> PEG might be simple too.
> Qualifying as "efficient" however raise the bar a bit high ;)
I meant converting a known, fixed format as quickly as possible, this is not really a parsing problem, it is just a conversion.
Sven
More information about the Squeak-dev
mailing list
|