[squeak-dev] The Inbox: ShoutCore-ct.78.mcz

Mon Jan 13 09:42:01 UTC 2020

Hi Levente,

many thanks for your review!

> Reading the source of a method - especially if you read the source of the same method multiple times, which is what you're trying to fix - should not take "several hundred milliseconds", no matter how large your image is.

That's indeed interesting.

> Besides the obvious things, the runtime depends on a few things like:
> - the amount of free RAM there is,
> - how well your OS caches files,
> - whether the method's source has been cached or not,
> - the speed of the drive the source file is stored on,
> - whether you have too many open files and are close to the maxExternalSemaphores limit (which should have been bumped to 8k long ago),
but, in general, it should be in the submillisecond range.

80% - 90% of 16 RAM.
Windows 10, SSD ... Setup should be rather state of the start.
No additional antivirus software. Disabling OneDrive, which synces image & changes file in the background, accelerated the issue by around 30%, but it's still *much* slower than on your system.
I tested it on a fresh Trunk image, so there shouldn't be a lot of files opened.

> but, in general, it should be in the submillisecond range.
> Here's an example reading the source of a rather large method (11kB):

[ (MethodFinder >> #initialize) getSource ] bench.

'11.4 per second. 87.7 milliseconds per run. 0.01214 % GC time.'

Is this a typical workload?

[cid:ec96ec35-ab09-440a-b023-3210b7c34e7b]

> >        | styler |

> >        styler := SHTextStylerST80 new.
> >        styler context: (thisContext stack at: 4). "to get a not-so-recently modified method"
> >        [styler styledTextFor: 'foo' asText] bench
> Here's the result of the benchmark executed on my machine without your changes:
>   '4,700 per second. 213 microseconds per run. 1.04 % GC time.'

How fast is the example if you do load my example?

In general, do you think my proposal makes any sense for "normal" systems?

> I know I sound like a broken record, but if you care about performance, don't use symbols in place of blocks, especially as the arguments of methods like #ifNotNil:, which would optimize the blocks away.

Optimization is a good point (although I have the impression that this might be rather premature optimization). Also, we could optimize symbols in the same way as blocks, couldn't we? But I will try to keep it in mind, also in terms of readability :)

> While #take: works, I would keep using #first: here. We know that the receiver is String (or at least it should be) and we need to get the first N characters. We can also assume that contextSourcePcIndex is within the range of the source. If it's not, then it's very likely a bug that would be hidden by #take:.

Actually, I got a "subscript out of bounds" a few times, so there might be indeed a bug somewhere. Do we really want Shout to be the bug-checking instance here? We also could raise a warning if the source is too short, without irritating all kind of users with a non-working toolset (because this would likely crash the debugger as well). I don't know what's the typical approach in Squeak to balance robustness vs error detection ...

Best,
Christoph
________________________________
Von: Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im Auftrag von Levente Uzonyi <leves at caesar.elte.hu>
Gesendet: Sonntag, 12. Januar 2020 19:04:35
An: squeak-dev at lists.squeakfoundation.org
Betreff: Re: [squeak-dev] The Inbox: ShoutCore-ct.78.mcz

Hi Christoph,

On Sun, 12 Jan 2020, commits at source.squeak.org wrote:

> Christoph Thiede uploaded a new version of ShoutCore to project The Inbox:
> http://source.squeak.org/inbox/ShoutCore-ct.78.mcz
>
> ==================== Summary ====================
>
> Name: ShoutCore-ct.78
> Author: ct
> Time: 12 January 2020, 3:00:21.215555 am
> UUID: c04a2f92-7a12-1743-a4d7-5d555c8ee634
> Ancestors: ShoutCore-mt.77
>
> Optimize context-dependent styling of multiple texts in the same context.
>
> Reading the context's method from the sources file can be *really* slow (up to several hundred milliseconds, depending on the size of your image). Thus we don't need to request the sources file again each time we handle a subsequent styling request for the same context.

Reading the source of a method - especially if you read the source of
the same method multiple times, which is what you're trying to fix -
should not take "several hundred milliseconds", no matter how large
your image is. Besides the obvious things, the runtime depends on a few
things like:
- the amount of free RAM there is,
- how well your OS caches files,
- whether the method's source has been cached or not,
- the speed of the drive the source file is stored on,
- whether you have too many open files and are close to the maxExternalSemaphores limit (which should have been bumped to 8k long ago),
but, in general, it should be in the submillisecond range.
Here's an example reading the source of a rather large method (11kB):

[ (MethodFinder >> #initialize) getSource ] bench.
'11,800 per second. 84.5 microseconds per run. 5.95762 % GC time.'

>
> Measurements from a fresh Trunk image:
>
>        | styler |
>        styler := SHTextStylerST80 new.
>        styler context: (thisContext stack at: 4). "to get a not-so-recently modified method"
>        [styler styledTextFor: 'foo' asText] bench
>
> Before: 36 milliseconds/run
> After: 161 microseconds/run

Here's the result of the benchmark executed on my machine without your
changes:

  '4,700 per second. 213 microseconds per run. 1.04 % GC time.'

So, perhaps there's some other thing causing the slowdown, e.g. one of the
reasons I mentioned above or some anti-virus software.

>
> Cache is fast:
>
>        [| styler |
>        styler := SHTextStylerST80 new.
>        styler context: (thisContext stack at: 4). "to get a not-so-recently modified method"
>        styler styledTextFor: 'foo' asText] bench
>
> Before: 276 milliseconds/run
> After: 276 milliseconds/run "sic!"
>
> =============== Diff against ShoutCore-mt.77 ===============
>
> Item was changed:
>  Object subclass: #SHParserST80
> +      instanceVariableNames: 'source parseAMethod classOrMetaClass workspace environment context arguments sourcePosition currentToken currentTokenFirst temporaries instanceVariables errorBlock currentTokenSourcePosition bracketDepth ranges allowUnderscoreAssignments allowUnderscoreSelectors allowBlockArgumentAssignment currentTokenType contextMethodCache'
> -      instanceVariableNames: 'classOrMetaClass source workspace arguments sourcePosition currentToken currentTokenFirst temporaries instanceVariables errorBlock currentTokenSourcePosition bracketDepth ranges environment allowUnderscoreAssignments allowUnderscoreSelectors allowBlockArgumentAssignment parseAMethod currentTokenType context'

The new instance variable should be documented in the class comment.

>        classVariableNames: ''
>        poolDictionaries: ''
>        category: 'ShoutCore-Parsing'!
>
>  !SHParserST80 commentStamp: 'ul 7/30/2019 00:31' prior: 0!
>  I am a Smalltalk method / expression parser.
>
>  Rather than creating an Abstract Syntax Tree, I create a sequence of SHRanges (in my 'ranges' instance variable), which represent the tokens within the String I am parsing.
>
>  I am used by a SHTextStylerST80 to parse method source strings.
>  I am able to parse incomplete / incorrect methods, and so can be used to parse methods that are being edited.
>
>  Instance Variables
>        allowBlockArgumentAssignment:           <Boolean>
>        allowUnderscoreAssignments:             <Boolean>
>        allowUnderscoreSelectors:               <Boolean>
>        arguments:              <OrderedCollection<OrderedCollection<String>|nil>
>        bracketDepth:           <Integer>
>        classOrMetaClass:               <Class|nil>
>        currentToken:           <String|nil>
>        currentTokenFirst:              <Character>
>        currentTokenSourcePosition:             <Integer|nil>
>        currentTokenType:               <Symbol|nil>
>        environment:            <Environment>
>        errorBlock:             <Block>
>        instanceVariables:              <Array>
>        parseAMethod:           <Boolean>
>        ranges:         <OrderedCollection<SHRange>>
>        source:         <String>
>        sourcePosition:         <Integer>
>        temporaries:            <OrderedCollection<OrderedCollection<String>|nil>
>        workspace:              <Workspace|nil>
>        context:                <Context|nil>
>
>  allowBlockArgumentAssignment
>        The value cached at the beginning of parsing of Scanner allowBlockArgumentAssignment.
>
>  allowUnderscoreAssignments
>        The value cached at the beginning of parsing of Scanner allowUnderscoreAsAssignment.
>
>  allowUnderscoreSelectors
>        The value cached at the beginning of parsing of Scanner prefAllowUnderscoreSelectors.
>
>  arguments
>        This OrderedCollection has an element for each scope encapsulating the current scope.
>        The current scope's arguments are stored in the last element. The first element holds the outermost scope's arguments.
>        Each element is nil when the corresponding scope doesn't have any arguments, and the element is an OrderedCollection with the names of the arguments declared at the given scope when there's at least one.
>        The size of this variable is the same as the size of temporaries.
>
>  bracketDepth
>        Stores the number of unclosed brackets "("  and parentheses "[" before the current sourcePosition.
>
>  classOrMetaClass
>        The Class or MetaClass instance, class and pool variables should be looked up during parsing or nil when not parsing code in the context of a class (e.g. when parsing code written in a Workspace). Having this set doesn't mean a method is being parsed.
>
>  currentToken
>        The token being analyzed for which the next range should be created for.
>
>  currentTokenFirst
>        The first character of currentToken cached for quick access or a space character when there are no more tokens to parse.
>        Being always a Character helps avoiding extra checks.
>
>  currentTokenSourcePosition
>        The position of source the current token starts at or nil when there are no more tokens to process.
>
>  currentTokenType
>        The type of the current token calculated lazily by #currentTokenType. When it has been calculated, Its value is one of #keyword, #assignment, #ansiAssignment, #binary, #name, #other and occasionally #invalid.
>
>  environment
>        The Environment globals and classes should be looked up at during parsing when classOrMetaClass is nil. Its value is Smalltalk globals by default.
>
>  errorBlock
>        A block used to quickly stop parsing in case of an unrecoverable parse error.
>
>  instanceVariables
>        An Array with the instance variable names of classOrMetaClass or an empty Array when classOrMetaClass is nil.
>
>  parseAMethod
>        A way to tell the parser to parse source as a code snippet instead of a method. Mainly used by inspectors.
>
>  ranges
>        The SHRanges parsed by the parser.
>
>  source
>        The source code as a String to be parsed.
>
>  sourcePosition
>        souce is treated as a stream by the parser. This variable stores the stream position.
>
>  temporaries
>        This OrderedCollection has an element for each scope encapsulating the current scope.
>        The current scope's temporaries are stored in the last element. The first element holds the outermost scope's temporaries.
>        Each element is nil when the corresponding scope doesn't have any temporary variables, and the element is an OrderedCollection with the names of the temporaries declared at the given scope when there's at least one.
>        The size of this variable is the same as the size of arguments.
>
>  workspace
>        The Workspace in whose context variables should be looked up during parsing or nil when not parsing code in a workspace.
>
>  context
>        The Context in which variables should be looked up during parsing or nil when not parsing within a context.
>
>  Example (explore it):
>
>        ranges := SHParserST80 new
>                classOrMetaClass: Object;
>                source: 'testMethod ^self';
>                parse;
>                ranges
>
>  Benchmark (print it):
>
>        SHParserST80 benchmark!
>
> Item was changed:
>  ----- Method: SHParserST80>>initializeVariablesFromContext (in category 'parse support') -----
>  initializeVariablesFromContext
>
> +      | contextMethod contextSource contextSourcePcIndex contextSourceParser |
> +      contextMethod := context method.
> +      (contextMethodCache ifNotNil: #key) = contextMethod "for optimization"

I know I sound like a broken record, but if you care about performance,
don't use symbols in place of blocks, especially as the arguments of
methods like #ifNotNil:, which would optimize the blocks away.
Also, using a symbol with #ifNotNil: may not legible to the average
Smalltalker. I'm sure some would expect [ x ifNotNil: #y ] to yield #y
when x is not nil.

> +              ifFalse: [contextMethodCache := contextMethod -> contextMethod getSource].
> +      contextSource := contextMethodCache value.
> +
> -      | contextSourcePcIndex contextSourceParser |
>        contextSourcePcIndex := (context debuggerMap
>                rangeForPC: (context isDead ifTrue: [context endPC] ifFalse: [context pc])
>                in: context method
>                contextIsActiveContext: true "... to really use the context's pc.")
>                        start.
>        contextSourceParser := self class new
>                classOrMetaClass: context method methodClass;
> +              environment: context receiver environment;
> +              source: (contextSource take: contextSourcePcIndex);

We should stop passing Text obejcts as source to SHParserST80. It only
works on Strings. Eliot added a guard recently because of the
introduction of texts, and I wanted to fixed that, but it seems I never
got to do that.
CompiledMethod >> #getSource returns a Text, so the best place to convert
to String is right after sending #getSource. That would also save us a few
CPU cycles.

While #take: works, I would keep using #first: here. We know that the
receiver is String (or at least it should be) and we need to get the first
N characters. We can also assume that contextSourcePcIndex is within the
range of the source. If it's not, then it's very likely a bug that would
be hidden by #take:.

Levente

> -              environment: self environment;
> -              source: (context method getSource first: contextSourcePcIndex);
>                yourself.
>        contextSourceParser parse.
>        arguments := contextSourceParser activeArguments.
>        temporaries  := contextSourceParser activeTemporaries.!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200113/0de990ca/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pastedImage.png
Type: image/png
Size: 127183 bytes
Desc: pastedImage.png
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20200113/0de990ca/attachment-0001.png>