[squeak-dev] The Inbox: ShoutCore-ct.78.mcz

Levente Uzonyi leves at caesar.elte.hu
Sun Jan 12 18:04:35 UTC 2020


Hi Christoph,

On Sun, 12 Jan 2020, commits at source.squeak.org wrote:

> Christoph Thiede uploaded a new version of ShoutCore to project The Inbox:
> http://source.squeak.org/inbox/ShoutCore-ct.78.mcz
>
> ==================== Summary ====================
>
> Name: ShoutCore-ct.78
> Author: ct
> Time: 12 January 2020, 3:00:21.215555 am
> UUID: c04a2f92-7a12-1743-a4d7-5d555c8ee634
> Ancestors: ShoutCore-mt.77
>
> Optimize context-dependent styling of multiple texts in the same context.
>
> Reading the context's method from the sources file can be *really* slow (up to several hundred milliseconds, depending on the size of your image). Thus we don't need to request the sources file again each time we handle a subsequent styling request for the same context.

Reading the source of a method - especially if you read the source of 
the same method multiple times, which is what you're trying to fix - 
should not take "several hundred milliseconds", no matter how large 
your image is. Besides the obvious things, the runtime depends on a few 
things like:
- the amount of free RAM there is,
- how well your OS caches files,
- whether the method's source has been cached or not,
- the speed of the drive the source file is stored on,
- whether you have too many open files and are close to the maxExternalSemaphores limit (which should have been bumped to 8k long ago),
but, in general, it should be in the submillisecond range.
Here's an example reading the source of a rather large method (11kB):

[ (MethodFinder >> #initialize) getSource ] bench.
'11,800 per second. 84.5 microseconds per run. 5.95762 % GC time.'

>
> Measurements from a fresh Trunk image:
>
> 	| styler |
> 	styler := SHTextStylerST80 new.
> 	styler context: (thisContext stack at: 4). "to get a not-so-recently modified method"
> 	[styler styledTextFor: 'foo' asText] bench
>
> Before: 36 milliseconds/run
> After: 161 microseconds/run

Here's the result of the benchmark executed on my machine without your 
changes:

  '4,700 per second. 213 microseconds per run. 1.04 % GC time.'

So, perhaps there's some other thing causing the slowdown, e.g. one of the 
reasons I mentioned above or some anti-virus software.

>
> Cache is fast:
>
> 	[| styler |
> 	styler := SHTextStylerST80 new.
> 	styler context: (thisContext stack at: 4). "to get a not-so-recently modified method"
> 	styler styledTextFor: 'foo' asText] bench
>
> Before: 276 milliseconds/run
> After: 276 milliseconds/run "sic!"
>
> =============== Diff against ShoutCore-mt.77 ===============
>
> Item was changed:
>  Object subclass: #SHParserST80
> + 	instanceVariableNames: 'source parseAMethod classOrMetaClass workspace environment context arguments sourcePosition currentToken currentTokenFirst temporaries instanceVariables errorBlock currentTokenSourcePosition bracketDepth ranges allowUnderscoreAssignments allowUnderscoreSelectors allowBlockArgumentAssignment currentTokenType contextMethodCache'
> - 	instanceVariableNames: 'classOrMetaClass source workspace arguments sourcePosition currentToken currentTokenFirst temporaries instanceVariables errorBlock currentTokenSourcePosition bracketDepth ranges environment allowUnderscoreAssignments allowUnderscoreSelectors allowBlockArgumentAssignment parseAMethod currentTokenType context'

The new instance variable should be documented in the class comment.

>  	classVariableNames: ''
>  	poolDictionaries: ''
>  	category: 'ShoutCore-Parsing'!
>
>  !SHParserST80 commentStamp: 'ul 7/30/2019 00:31' prior: 0!
>  I am a Smalltalk method / expression parser.
>
>  Rather than creating an Abstract Syntax Tree, I create a sequence of SHRanges (in my 'ranges' instance variable), which represent the tokens within the String I am parsing.
>
>  I am used by a SHTextStylerST80 to parse method source strings.
>  I am able to parse incomplete / incorrect methods, and so can be used to parse methods that are being edited.
>
>  Instance Variables
>  	allowBlockArgumentAssignment:		<Boolean>
>  	allowUnderscoreAssignments:		<Boolean>
>  	allowUnderscoreSelectors:		<Boolean>
>  	arguments:		<OrderedCollection<OrderedCollection<String>|nil>
>  	bracketDepth:		<Integer>
>  	classOrMetaClass:		<Class|nil>
>  	currentToken:		<String|nil>
>  	currentTokenFirst:		<Character>
>  	currentTokenSourcePosition:		<Integer|nil>
>  	currentTokenType:		<Symbol|nil>
>  	environment:		<Environment>
>  	errorBlock:		<Block>
>  	instanceVariables:		<Array>
>  	parseAMethod:		<Boolean>
>  	ranges:		<OrderedCollection<SHRange>>
>  	source:		<String>
>  	sourcePosition:		<Integer>
>  	temporaries:		<OrderedCollection<OrderedCollection<String>|nil>
>  	workspace:		<Workspace|nil>
>  	context:		<Context|nil>
>
>  allowBlockArgumentAssignment
>  	The value cached at the beginning of parsing of Scanner allowBlockArgumentAssignment.
>
>  allowUnderscoreAssignments
>  	The value cached at the beginning of parsing of Scanner allowUnderscoreAsAssignment.
>
>  allowUnderscoreSelectors
>  	The value cached at the beginning of parsing of Scanner prefAllowUnderscoreSelectors.
>
>  arguments
>  	This OrderedCollection has an element for each scope encapsulating the current scope.
>  	The current scope's arguments are stored in the last element. The first element holds the outermost scope's arguments.
>  	Each element is nil when the corresponding scope doesn't have any arguments, and the element is an OrderedCollection with the names of the arguments declared at the given scope when there's at least one.
>  	The size of this variable is the same as the size of temporaries.
>
>  bracketDepth
>  	Stores the number of unclosed brackets "("  and parentheses "[" before the current sourcePosition.
>
>  classOrMetaClass
>  	The Class or MetaClass instance, class and pool variables should be looked up during parsing or nil when not parsing code in the context of a class (e.g. when parsing code written in a Workspace). Having this set doesn't mean a method is being parsed.
>
>  currentToken
>  	The token being analyzed for which the next range should be created for.
>
>  currentTokenFirst
>  	The first character of currentToken cached for quick access or a space character when there are no more tokens to parse.
>  	Being always a Character helps avoiding extra checks.
>
>  currentTokenSourcePosition
>  	The position of source the current token starts at or nil when there are no more tokens to process.
>
>  currentTokenType
>  	The type of the current token calculated lazily by #currentTokenType. When it has been calculated, Its value is one of #keyword, #assignment, #ansiAssignment, #binary, #name, #other and occasionally #invalid.
>
>  environment
>  	The Environment globals and classes should be looked up at during parsing when classOrMetaClass is nil. Its value is Smalltalk globals by default.
>
>  errorBlock
>  	A block used to quickly stop parsing in case of an unrecoverable parse error.
>
>  instanceVariables
>  	An Array with the instance variable names of classOrMetaClass or an empty Array when classOrMetaClass is nil.
>
>  parseAMethod
>  	A way to tell the parser to parse source as a code snippet instead of a method. Mainly used by inspectors.
>
>  ranges
>  	The SHRanges parsed by the parser.
>
>  source
>  	The source code as a String to be parsed.
>
>  sourcePosition
>  	souce is treated as a stream by the parser. This variable stores the stream position.
>
>  temporaries
>  	This OrderedCollection has an element for each scope encapsulating the current scope.
>  	The current scope's temporaries are stored in the last element. The first element holds the outermost scope's temporaries.
>  	Each element is nil when the corresponding scope doesn't have any temporary variables, and the element is an OrderedCollection with the names of the temporaries declared at the given scope when there's at least one.
>  	The size of this variable is the same as the size of arguments.
>
>  workspace
>  	The Workspace in whose context variables should be looked up during parsing or nil when not parsing code in a workspace.
>
>  context
>  	The Context in which variables should be looked up during parsing or nil when not parsing within a context.
>
>  Example (explore it):
>
>  	ranges := SHParserST80 new
>  		classOrMetaClass: Object;
>  		source: 'testMethod ^self';
>  		parse;
>  		ranges
>
>  Benchmark (print it):
>
>  	SHParserST80 benchmark!
>
> Item was changed:
>  ----- Method: SHParserST80>>initializeVariablesFromContext (in category 'parse support') -----
>  initializeVariablesFromContext
> 
> + 	| contextMethod contextSource contextSourcePcIndex contextSourceParser |
> + 	contextMethod := context method.
> + 	(contextMethodCache ifNotNil: #key) = contextMethod "for optimization"

I know I sound like a broken record, but if you care about performance, 
don't use symbols in place of blocks, especially as the arguments of 
methods like #ifNotNil:, which would optimize the blocks away.
Also, using a symbol with #ifNotNil: may not legible to the average 
Smalltalker. I'm sure some would expect [ x ifNotNil: #y ] to yield #y 
when x is not nil.

> + 		ifFalse: [contextMethodCache := contextMethod -> contextMethod getSource].
> + 	contextSource := contextMethodCache value.
> + 
> - 	| contextSourcePcIndex contextSourceParser |
>  	contextSourcePcIndex := (context debuggerMap
>  		rangeForPC: (context isDead ifTrue: [context endPC] ifFalse: [context pc])
>  		in: context method
>  		contextIsActiveContext: true "... to really use the context's pc.")
>  			start.
>  	contextSourceParser := self class new
>  		classOrMetaClass: context method methodClass;
> + 		environment: context receiver environment;
> + 		source: (contextSource take: contextSourcePcIndex);

We should stop passing Text obejcts as source to SHParserST80. It only 
works on Strings. Eliot added a guard recently because of the 
introduction of texts, and I wanted to fixed that, but it seems I never 
got to do that.
CompiledMethod >> #getSource returns a Text, so the best place to convert 
to String is right after sending #getSource. That would also save us a few 
CPU cycles.

While #take: works, I would keep using #first: here. We know that the 
receiver is String (or at least it should be) and we need to get the first 
N characters. We can also assume that contextSourcePcIndex is within the 
range of the source. If it's not, then it's very likely a bug that would 
be hidden by #take:.


Levente

> - 		environment: self environment;
> - 		source: (context method getSource first: contextSourcePcIndex);
>  		yourself.
>  	contextSourceParser parse.
>  	arguments := contextSourceParser activeArguments.
>  	temporaries  := contextSourceParser activeTemporaries.!


More information about the Squeak-dev mailing list