Hi Eliot, hi all!
I am having issues with the inlining of #to:do: in combination with decompiling a block and accessing the temps.
Here is a piece of code that works as expected:
| result context | result := OrderedCollection new.
(1 to: 5) do: [:each | result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each'). " == 3"
HOWEVER, using #to:do: the result changes because code gets inlined:
| result context | result := OrderedCollection new.
1 to: 5 do: [:each | result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each'). " == 6"
What is my goal here? I want to unpack the block statement and fill in the temps. For all those [:object | object + each] in "result" I want to collect the strings:
-> 'object + 1' -> 'object + 2'
-> 'object + 3'
-> 'object + 4'
-> 'object + 5'
Is there a way to access the values of temps when the block closures are created this way?
Best, Marcel
Hi Marcel,
On Mar 26, 2020, at 5:15 AM, Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi Eliot, hi all!
I am having issues with the inlining of #to:do: in combination with decompiling a block and accessing the temps.
Here is a piece of code that works as expected:
| result context | result := OrderedCollection new.
(1 to: 5) do: [:each | result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each'). " == 3"
HOWEVER, using #to:do: the result changes because code gets inlined:
| result context | result := OrderedCollection new.
1 to: 5 do: [:each | result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each'). " == 6"
What is my goal here? I want to unpack the block statement and fill in the temps. For all those [:object | object + each] in "result" I want to collect the strings:
-> 'object + 1' -> 'object + 2' -> 'object + 3' -> 'object + 4' -> 'object + 5'
Is there a way to access the values of temps when the block closures are created this way?
Nice one. So the thing to realize is that, because of inlining, the second example is in fact equivalent to
| result context each | result := OrderedCollection new.
each := 1. [ each <= 5 ] whileTrue: [ result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each') "= 6"
and when written like this it is clear that the search for the value will end up being as found, 6.
So then the thing to realize is that the inlining decision is wrong. That in fact the closing over of each (the index variable of the inlined to:do:) within an uninlined inner block means that the outer block cannot be inlined without breaking the inner block.
So the next thing to determine is whether the compiler can easily be modified to identify this case and hence prevent inlining in case like these.
After that we have to assess and find out how many cases in the system (eg in trunk) would no longer be inlined and decide whether the cure is worse than the disease. The issue here is two fold, the loss of performance due to not inlining, and the slow down to compilation due to the check(s) needed to prevent inlining in this case.
Why would adding the check potentially cause many to:[by:]do: invocations to no longer be inlined? The check is to find anywhere we see a (non-optimized) block created that closes over an index variable and is passed as an argument. Because this is a dynamic language we cannot assume anything about the message the block is an argument to; the programmer could redefine an implementation of a method with that selector to capture the block argument just as your example does. So we have to assume the block could be captured, and hence that the to:[by:]do: in which it occurs cannot be inlined.
Other Smalltalks have decided to live with the disease. I think that the right decision is *not* to live with the disease, but to fix the inliner, and wait for Scorch to “do the right thing” and inline dynamically. Scorch can inline precisely because it does indeed find out about the caller, and inline if the caller does not capture the block, and adds the dependency information to enable the optimized version to be discarded if redefinition occurs.
The first thing to do is write a test asserting the correct behaviour.
The second thing to do is learn how and where the bytecode compiler decides whether or not to inline to:[by:]do:.
The third thing to do is to modify that inlining decision to generate correct code for your case.
The fourth thing to do is then to find out how many methods are affected.
Finally we can then assess whether we want to live with the disease or not.
Best, Marcel
Best, Eliot _,,,^..^,,,_ (phone)
Hi Eliot,
thanks for the elaborate explanation! Wow, even more things are getting blocked while we are longing for Scorch. :D Could you tell us when you expect to start making your blessing available for Squeak, only very rough? Are we speaking of months? Years? Decades? ;-)
Here're my two cents on the story:
The check is to find anywhere we see a (non-optimized) block created that closes over an index variable and is passed as an argument.
Hm, what will happen with this:
1 to: 5 do: [:each |
[:object | object + each] in: [:block | result add: block]].
The left block is not an argument, but still it would be relevant to disable the inlining of the outer block. So your rule has to respect both receiver and arguments, am I right? This will increase the "collateral damage" of unnecessary slow down even further, we would also have to forgo inlining for things like {[:object | object + each] cull: 42} etc.
Or this one:
1 to: 5 do: [:each |
[result add: thisContext sender top] value; yourself].
And finally:
1 to: 5 do: [:each |
result add: [thisContext tempVarNamed: 'each']].
You can (almost) always crack the system, so - if we should decide to turn off inlining sometimes - however we do this, in each case, there will be a logical break at some position. In my humble opinion, this would cause more confusion than actually solve problems. There has never been given a guarantee that blocks will be not inlined. If you wish to prevent this, the simplest and clearest solution IMO is still
1 to: 5 do: [:each | result add: [:object | object + each] ] yourself.
Best,
Christoph
________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Eliot Miranda eliot.miranda@gmail.com Gesendet: Donnerstag, 26. März 2020 17:20:08 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Question about inlining | How to access named temps in FullBlockClosure?
Hi Marcel,
On Mar 26, 2020, at 5:15 AM, Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi Eliot, hi all!
I am having issues with the inlining of #to:do: in combination with decompiling a block and accessing the temps.
Here is a piece of code that works as expected:
| result context | result := OrderedCollection new.
(1 to: 5) do: [:each | result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each'). " == 3"
HOWEVER, using #to:do: the result changes because code gets inlined:
| result context | result := OrderedCollection new.
1 to: 5 do: [:each | result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each'). " == 6"
What is my goal here? I want to unpack the block statement and fill in the temps. For all those [:object | object + each] in "result" I want to collect the strings:
-> 'object + 1' -> 'object + 2' -> 'object + 3' -> 'object + 4' -> 'object + 5'
Is there a way to access the values of temps when the block closures are created this way?
Nice one. So the thing to realize is that, because of inlining, the second example is in fact equivalent to
| result context each | result := OrderedCollection new.
each := 1. [ each <= 5 ] whileTrue: [ result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each') "= 6"
and when written like this it is clear that the search for the value will end up being as found, 6.
So then the thing to realize is that the inlining decision is wrong. That in fact the closing over of each (the index variable of the inlined to:do:) within an uninlined inner block means that the outer block cannot be inlined without breaking the inner block.
So the next thing to determine is whether the compiler can easily be modified to identify this case and hence prevent inlining in case like these.
After that we have to assess and find out how many cases in the system (eg in trunk) would no longer be inlined and decide whether the cure is worse than the disease. The issue here is two fold, the loss of performance due to not inlining, and the slow down to compilation due to the check(s) needed to prevent inlining in this case.
Why would adding the check potentially cause many to:[by:]do: invocations to no longer be inlined? The check is to find anywhere we see a (non-optimized) block created that closes over an index variable and is passed as an argument. Because this is a dynamic language we cannot assume anything about the message the block is an argument to; the programmer could redefine an implementation of a method with that selector to capture the block argument just as your example does. So we have to assume the block could be captured, and hence that the to:[by:]do: in which it occurs cannot be inlined.
Other Smalltalks have decided to live with the disease. I think that the right decision is *not* to live with the disease, but to fix the inliner, and wait for Scorch to “do the right thing” and inline dynamically. Scorch can inline precisely because it does indeed find out about the caller, and inline if the caller does not capture the block, and adds the dependency information to enable the optimized version to be discarded if redefinition occurs.
The first thing to do is write a test asserting the correct behaviour.
The second thing to do is learn how and where the bytecode compiler decides whether or not to inline to:[by:]do:.
The third thing to do is to modify that inlining decision to generate correct code for your case.
The fourth thing to do is then to find out how many methods are affected.
Finally we can then assess whether we want to live with the disease or not.
Best, Marcel
Best, Eliot _,,,^..^,,,_ (phone)
Hi Christoph,
On Mar 26, 2020, at 10:42 AM, Thiede, Christoph Christoph.Thiede@student.hpi.uni-potsdam.de wrote:
Hi Eliot,
thanks for the elaborate explanation! Wow, even more things are getting blocked while we are longing for Scorch. :D Could you tell us when you expect to start making your blessing available for Squeak, only very rough? Are we speaking of months? Years? Decades? ;-)
That will depend on if I can find a few skilled collaborators or not (too many and communication costs will overwhelm). Certainly not decades. Clément has things working. So of the order of six to twelve person months of effort.
I encourage you to look at Clément’s presentations and see if it interests you. You might be one of those people. This is a great PhD topic.
Here're my two cents on the story:
The check is to find anywhere we see a (non-optimized) block created that closes over an index variable and is passed as an argument.
Hm, what will happen with this:
1 to: 5 do: [:each |
[:object | object + each] in: [:block | result add: block]].
The left block is not an argument, but still it would be relevant to disable the inlining of the outer block. So your rule has to respect both receiver and arguments, am I right? This will increase the "collateral damage" of unnecessary slow down even further, we would also have to forgo inlining for things like {[:object | object + each] cull: 42} etc.
Quite right. Can you find out how common cases like these are? I bet you there will be five methods or less in trunk that are affected.
Or this one:
1 to: 5 do: [:each |
[result add: thisContext sender top] value; yourself].
And finally:
1 to: 5 do: [:each |
result add: [thisContext tempVarNamed: 'each']].
I find these last two not compelling. If one is dealing with thisContext one is close to the implementation so I would argue you get the unvarnished reality, not the ideal specification.
You can (almost) always crack the system, so - if we should decide to turn off inlining sometimes - however we do this, in each case, there will be a logical break at some position. In my humble opinion, this would cause more confusion than actually solve problems. There has never been given a guarantee that blocks will be not inlined.
Agreed.
If you wish to prevent this, the simplest and clearest solution IMO is still
1 to: 5 do: [:each | result add: [:object | object + each] ] yourself.
Best,
Christoph
Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Eliot Miranda eliot.miranda@gmail.com Gesendet: Donnerstag, 26. März 2020 17:20:08 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Question about inlining | How to access named temps in FullBlockClosure?
Hi Marcel,
On Mar 26, 2020, at 5:15 AM, Marcel Taeumel marcel.taeumel@hpi.de wrote:
Hi Eliot, hi all!
I am having issues with the inlining of #to:do: in combination with decompiling a block and accessing the temps.
Here is a piece of code that works as expected:
| result context | result := OrderedCollection new.
(1 to: 5) do: [:each | result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each'). " == 3"
HOWEVER, using #to:do: the result changes because code gets inlined:
| result context | result := OrderedCollection new.
1 to: 5 do: [:each | result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each'). " == 6"
What is my goal here? I want to unpack the block statement and fill in the temps. For all those [:object | object + each] in "result" I want to collect the strings:
-> 'object + 1' -> 'object + 2' -> 'object + 3' -> 'object + 4' -> 'object + 5'
Is there a way to access the values of temps when the block closures are created this way?
Nice one. So the thing to realize is that, because of inlining, the second example is in fact equivalent to
| result context each | result := OrderedCollection new.
each := 1. [ each <= 5 ] whileTrue: [ result add: [:object | object + each]].
context := (result at: 3) outerContext. context namedTempAt: (context tempNames indexOf: 'each') "= 6"
and when written like this it is clear that the search for the value will end up being as found, 6.
So then the thing to realize is that the inlining decision is wrong. That in fact the closing over of each (the index variable of the inlined to:do:) within an uninlined inner block means that the outer block cannot be inlined without breaking the inner block.
So the next thing to determine is whether the compiler can easily be modified to identify this case and hence prevent inlining in case like these.
After that we have to assess and find out how many cases in the system (eg in trunk) would no longer be inlined and decide whether the cure is worse than the disease. The issue here is two fold, the loss of performance due to not inlining, and the slow down to compilation due to the check(s) needed to prevent inlining in this case.
Why would adding the check potentially cause many to:[by:]do: invocations to no longer be inlined? The check is to find anywhere we see a (non-optimized) block created that closes over an index variable and is passed as an argument. Because this is a dynamic language we cannot assume anything about the message the block is an argument to; the programmer could redefine an implementation of a method with that selector to capture the block argument just as your example does. So we have to assume the block could be captured, and hence that the to:[by:]do: in which it occurs cannot be inlined.
Other Smalltalks have decided to live with the disease. I think that the right decision is *not* to live with the disease, but to fix the inliner, and wait for Scorch to “do the right thing” and inline dynamically. Scorch can inline precisely because it does indeed find out about the caller, and inline if the caller does not capture the block, and adds the dependency information to enable the optimized version to be discarded if redefinition occurs.
The first thing to do is write a test asserting the correct behaviour.
The second thing to do is learn how and where the bytecode compiler decides whether or not to inline to:[by:]do:.
The third thing to do is to modify that inlining decision to generate correct code for your case.
The fourth thing to do is then to find out how many methods are affected.
Finally we can then assess whether we want to live with the disease or not.
Best, Marcel
Best, Eliot _,,,^..^,,,_ (phone)
Hi Eliot,
Quite right. Can you find out how common cases like these are? I bet you there will be five methods or less in trunk that are affected.
I did a full search in my image:
pattern := '.*to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[.*' asRegex. self systemNavigation allMethods select: [:m | pattern matches: m getSource].
My Source Files are very slow, http://forum.world.st/The-Inbox-ShoutCore-ct-78-mcz-tp5109909p5110050.html so I interrupted the script after one hour, but I still found more than 250 matches. I took some samples and they looked indeed relevant. Nested #to:do: loops, for example ... Or do I misunderstand your idea of a rule?
However, if we can agree on leaving it as it is, it's irrelevant ;-)
Best, Christoph
-- Sent from: http://forum.world.st/Squeak-Dev-f45488.html
Hi Christoph,
Le ven. 27 mars 2020 à 17:10, Christoph Thiede < christoph.thiede@student.hpi.uni-potsdam.de> a écrit :
Hi Eliot,
Quite right. Can you find out how common cases like these are? I bet
you
there will be five methods or less in trunk that are affected.
I did a full search in my image:
pattern := '.*to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[.*' asRegex. self systemNavigation allMethods select: [:m | pattern matches: m getSource].
My Source Files are very slow,
You might want to use:
CurrentReadOnlySourceFiles cacheDuring: [ ... ]
http://forum.world.st/The-Inbox-ShoutCore-ct-78-mcz-tp5109909p5110050.html
so I interrupted the script after one hour, but I still found more than 250 matches. I took some samples and they looked indeed relevant. Nested #to:do: loops, for example ... Or do I misunderstand your idea of a rule?
The problem is not so when evaluating nested block.
It's when returning a nested block for later evaluation.
However, if we can agree on leaving it as it is, it's irrelevant ;-)
Best, Christoph
-- Sent from: http://forum.world.st/Squeak-Dev-f45488.html
Hi Nicolas,
CurrentReadOnlySourceFiles cacheDuring: [ ... ]
thanks, but this still kept my image busy for more than ten minutes!
The problem is not so when evaluating nested block.
It's when returning a nested block for later evaluation.
I know, but unless the nested block is unlined, you cannot tell whether any message that is sent to it, such as #cull:, will store it for later? The compiler cannot know that #cull: does not look like this, for example:
cull: anObject Project current addDeferredUIMessage: [self value]. ^ self value: anObject
Best, Christoph
________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von Nicolas Cellier nicolas.cellier.aka.nice@gmail.com Gesendet: Montag, 30. März 2020 12:54:20 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Question about inlining | How to access named temps in FullBlockClosure?
Hi Christoph,
Le ven. 27 mars 2020 à 17:10, Christoph Thiede <christoph.thiede@student.hpi.uni-potsdam.demailto:christoph.thiede@student.hpi.uni-potsdam.de> a écrit : Hi Eliot,
Quite right. Can you find out how common cases like these are? I bet you there will be five methods or less in trunk that are affected.
I did a full search in my image:
pattern := '.*to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[.*' asRegex. self systemNavigation allMethods select: [:m | pattern matches: m getSource].
My Source Files are very slow,
You might want to use:
CurrentReadOnlySourceFiles cacheDuring: [ ... ]
http://forum.world.st/The-Inbox-ShoutCore-ct-78-mcz-tp5109909p5110050.html so I interrupted the script after one hour, but I still found more than 250 matches. I took some samples and they looked indeed relevant. Nested #to:do: loops, for example ... Or do I misunderstand your idea of a rule?
The problem is not so when evaluating nested block. It's when returning a nested block for later evaluation.
However, if we can agree on leaving it as it is, it's irrelevant ;-)
Best, Christoph
-- Sent from: http://forum.world.st/Squeak-Dev-f45488.html
Hi Christoph,
On Fri, 27 Mar 2020, Christoph Thiede wrote:
Hi Eliot,
Quite right. Can you find out how common cases like these are? I bet you there will be five methods or less in trunk that are affected.
I did a full search in my image:
pattern := '.*to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[.*' asRegex. self systemNavigation allMethods select: [:m | pattern matches: m getSource].
My Source Files are very slow, http://forum.world.st/The-Inbox-ShoutCore-ct-78-mcz-tp5109909p5110050.html so I interrupted the script after one hour, but I still found more than 250 matches. I took some samples and they looked indeed relevant. Nested #to:do: loops, for example ...
Try this instead:
| regex | regex := 'to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[' asRegex. CurrentReadOnlySourceFiles cacheDuring: [ SystemNavigation default browseAllSelect: [ :method | regex search: method getSource asString ] ].
The key differences are: - caching source files (as Nicolas suggested) - converting the source to String from Text (regex is intended to work with Strings not Texts) - using #search: instead of #matches:
This brings down the runtime to 2.5 seconds on my machine.
Levente
Or do I misunderstand your idea of a rule?
However, if we can agree on leaving it as it is, it's irrelevant ;-)
Best, Christoph
-- Sent from: http://forum.world.st/Squeak-Dev-f45488.html
On 2020-03-30, at 10:36 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
| regex | regex := 'to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[' asRegex. CurrentReadOnlySourceFiles cacheDuring: [ SystemNavigation default browseAllSelect: [ :method | regex search: method getSource asString ] ].
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Porting is such sweet sorrow
On 2020-03-30, at 10:36 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
Try this instead:
| regex | regex := 'to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[' asRegex. CurrentReadOnlySourceFiles cacheDuring: [ SystemNavigation default browseAllSelect: [ :method | regex search: method getSource asString ] ].
The key differences are:
- caching source files (as Nicolas suggested)
- converting the source to String from Text (regex is intended to work with Strings not Texts)
- using #search: instead of #matches:
This brings down the runtime to 2.5 seconds on my machine.
And a smidge over 10 sec on my Pi 4 :-)
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim A computer program does what you tell it to do, not what you want it to do.
| regex | regex := 'to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[' asRegex. CurrentReadOnlySourceFiles cacheDuring: [ SystemNavigation default browseAllSelect: [ :method | regex search: method getSource asString ] ].
This still timed out my patience after 10 minutes :( http://www.hpi.de/ ________________________________ Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von tim Rowledge tim@rowledge.org Gesendet: Montag, 30. März 2020 19:42:31 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Question about inlining | How to access named temps in FullBlockClosure?
On 2020-03-30, at 10:36 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
Try this instead:
| regex | regex := 'to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[' asRegex. CurrentReadOnlySourceFiles cacheDuring: [ SystemNavigation default browseAllSelect: [ :method | regex search: method getSource asString ] ].
The key differences are:
- caching source files (as Nicolas suggested)
- converting the source to String from Text (regex is intended to work with Strings not Texts)
- using #search: instead of #matches:
This brings down the runtime to 2.5 seconds on my machine.
And a smidge over 10 sec on my Pi 4 :-)
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim A computer program does what you tell it to do, not what you want it to do.
You should fire up the VM with some profiler (I don't know what's available on windows) to see why file access is so slow.
Levente
On Mon, 30 Mar 2020, Thiede, Christoph wrote:
| regex | regex := 'to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[' asRegex. CurrentReadOnlySourceFiles cacheDuring: [ SystemNavigation default browseAllSelect: [ :method | regex search: method getSource asString ] ].
This still timed out my patience after 10 minutes :(
Von: Squeak-dev squeak-dev-bounces@lists.squeakfoundation.org im Auftrag von tim Rowledge tim@rowledge.org Gesendet: Montag, 30. März 2020 19:42:31 An: The general-purpose Squeak developers list Betreff: Re: [squeak-dev] Question about inlining | How to access named temps in FullBlockClosure?
On 2020-03-30, at 10:36 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
Try this instead:
| regex | regex := 'to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[' asRegex. CurrentReadOnlySourceFiles cacheDuring: [ SystemNavigation default browseAllSelect: [ :method | regex search: method getSource asString ] ].
The key differences are:
- caching source files (as Nicolas suggested)
- converting the source to String from Text (regex is intended to work with Strings not Texts)
- using #search: instead of #matches:
This brings down the runtime to 2.5 seconds on my machine.
And a smidge over 10 sec on my Pi 4 :-)
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim A computer program does what you tell it to do, not what you want it to do.
Hi Christoph,
apologies for the late reply. Busy weekend trying to get the Vm to run if compiled with MSVC. Still not out of the woods...
On Fri, Mar 27, 2020 at 9:10 AM Christoph Thiede < christoph.thiede@student.hpi.uni-potsdam.de> wrote:
Hi Eliot,
Quite right. Can you find out how common cases like these are? I bet
you
there will be five methods or less in trunk that are affected.
I did a full search in my image:
pattern := '.*to:[\s\r\n]*\w+[\s\r\n]*do:[\s\r\n]*[[^]]*[.*' asRegex. self systemNavigation allMethods select: [:m | pattern matches: m getSource].
Well, that's not what I meant by a search. However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:. I think this is an awful implementation and would implement it very differently but that's the work-around we have in place now,
What I meant my a search was to look at MesasageNode>>transform: which is sent in MessageNode>>receiver:selector:arguments:precedence:from:. This means it is ideal. First of all it is bottom up; the to:[by:]do: message node is only created and transformed once all its sub components have been parsed. So one is in a position in MessageNode>>transformToDo: to search the parse tree of the last block argument looking for accesses in non-optimized blocks (*) to its argument. (*) remember its *any* reference in an unoptimized block, no mater how deeply nested. So a reference to the block argument within an optimized block that is within an unoptimized block is till a reference. e.g.
1 to: 5 do: [:arg| arg even ifTrue: [evens add: [arg]] ifFalse: [odds add: [arg]]
are still references even though they're both in optimized blocks. That's not a good example. I should say something like
1 to: 5 do: [:arg| them add: [arg even ifTrue: [arg] ifFalse: [arg+arg]]]
whereas here it doesn't matter:
1 to: 5 do: [:arg| them add: (arg even ifTrue: [arg] ifFalse: [arg+arg])]
you get the idea. So the search is tricky. Basically as the tree is descended we need to either set a flag that says the search is now in an unoptimized block, or the search needs to look up towards the root block arg to to:[by:]do: to find an intervening unoptimized block.
My Source Files are very slow, < http://forum.world.st/The-Inbox-ShoutCore-ct-78-mcz-tp5109909p5110050.html%3...
so I interrupted the script after one hour, but I still found more than 250 matches. I took some samples and they looked indeed relevant. Nested #to:do: loops, for example ... Or do I misunderstand your idea of a rule?
Yes. It has to be something in the parse tree hiersecy, actually in transformToDo:. And its th cost of applying this rule during compilation that is one3 thing to be concerned about.
Now, in orde4r to accelerate a search for candidates remember that Nicolas wonderfully changed the compiler to add as literals the selectors of even optimized message sends. So for example all uses of #to:[by:]do: have the selectors in their literal frame. So to find out all the methods that you want to analyze further simply start from
((self systemNavigation allCallsOn: #to:do:), (self systemNavigation allCallsOn: #to:by:do:)) asSet asArray sorted
BTW in my VMMaker image ((self systemNavigation allCallsOn: #to:do:), (self systemNavigation allCallsOn: #to:by:do:)) asSet asArray sorted size 2628
So you could enumerate over these, grab the parse trees, and then search the parse trees to try and find how many methods are affected and to get a rush idea of how expensive the search is.
However, if we can agree on leaving it as it is, it's irrelevant ;-)
Best, Christoph -- Sent from: http://forum.world.st/Squeak-Dev-f45488.html
_,,,^..^,,,_ best, Eliot
Hi Eliot,
On Mon, 30 Mar 2020, Eliot Miranda wrote:
Well, that's not what I meant by a search. However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:. I think this is an awful implementation and would implement it very differently but that's the work-around we have in place now,
How would you implement it?
<history> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files. I had the following options: 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor 2) same as 1) but with multiple copies pooled 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process
I chose the third option because it was possible to introduce it without exploring and rewriting existing users: you could leave all code as-is and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code that needed better performance. It's obviously not a perfect solution, but I still think it was the best available at the time.
Later ProcessLocalVariables were added to Trunk. Which could be used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after they are not needed any more. Perhaps a high priority process could do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems. </history>
Levente
On 2020-03-30, at 2:21 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files. I had the following options:
- controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor
- same as 1) but with multiple copies pooled
- exceptions to define the scope and lifetime of the resources (the read-only copies) within a process
Without having really considered this in years, if I remember right almost all this problem is at root caused by the stupidity of us having a file access regime based on the ansi stream stuff - ie set the position, do a read (or write, whatever) and since they are separate operations one can get utterly screwed up by multiple processes having access. That part completely goes away with a more sensible approach where the position and the read/write are a single operation. As in readFromThisFileAcessorAtThisPositionThisManyBytes. I suspect one could find email from me about this from '96 at least.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim When all else fails, let a = 7. If that doesn't help, then read the manual.
On Mon, Mar 30, 2020 at 03:33:09PM -0700, tim Rowledge wrote:
On 2020-03-30, at 2:21 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files. I had the following options:
- controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor
- same as 1) but with multiple copies pooled
- exceptions to define the scope and lifetime of the resources (the read-only copies) within a process
Without having really considered this in years, if I remember right almost all this problem is at root caused by the stupidity of us having a file access regime based on the ansi stream stuff - ie set the position, do a read (or write, whatever) and since they are separate operations one can get utterly screwed up by multiple processes having access. That part completely goes away with a more sensible approach where the position and the read/write are a single operation. As in readFromThisFileAcessorAtThisPositionThisManyBytes. I suspect one could find email from me about this from '96 at least.
I think I do recall that ancient conversation :-)
But I am a bit skeptical. After all, we are caching the file position in StandardFileStream>>position so if there are concurrency issues, it's not entirely obvious where the fault may lie.
How to find out? Test using a Windows VM. Andreas wrote the file primitives for Windows to operate at the direct file HANDLE level, rather than the higher level stdio streams that we use for the other VMs. If the original concurrency issues go away when running on Windows, then maybe it's true that the problem is caused by reliance on stdio streams in our non-Windows VMs. But if the problems also exist on Windows, then the problem is more likely right here in our own image.
If you consider the general concept of a stream over an underlying physical storage medium, it is quite natural to think of the possibility of multiple streams over the same underlying storage medium. That is what stdio streams do. I am not entirely confident that we are doing a good job of mapping that concept into StandardFileStream in Squeak.
Dave
Hi Levente,
On Mar 30, 2020, at 2:21 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
Hi Eliot,
On Mon, 30 Mar 2020, Eliot Miranda wrote:
Well, that's not what I meant by a search. However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:. I think this is an awful implementation and would implement it very differently but that's the work-around we have in place now,
How would you implement it?
<history> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files. I had the following options: 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor 2) same as 1) but with multiple copies pooled 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process
I chose the third option because it was possible to introduce it without exploring and rewriting existing users: you could leave all code as-is and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code that needed better performance. It's obviously not a perfect solution, but I still think it was the best available at the time.
Later ProcessLocalVariables were added to Trunk. Which could be used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after they are not needed any more. Perhaps a high priority process could do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems.
</history>
I think the key issue (& this from a discussion here with Bert) is access time source in the debugger while one is debugging file access. As the debugger asks for source so the file pointer is moved and hence screws up the access one is trying to debug.
So I would provide something like SourceFiles withSubstituteCopiesWhile: aBlock which would install either copies of the files or read-only copies of the files for the duration of the block, and have the debugger use the method around its access to method source.
The IDE is essentially single threaded as far as code modification goes, even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method dictionaries, and that limitation is fine in practice. So having the same restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source update to the changes file could get corrupted too).
So I think not using read-only copies to read source, and having the debugger use copies for its access would be a good lightweight solution.
Levente
Eliot
Hi Eliot,
On Mon, 30 Mar 2020, Eliot Miranda wrote:
Hi Levente,
On Mar 30, 2020, at 2:21 PM, Levente Uzonyi leves@caesar.elte.hu wrote:
Hi Eliot,
On Mon, 30 Mar 2020, Eliot Miranda wrote:
Well, that's not what I meant by a search. However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:. I think this is an awful implementation and would implement it very differently but that's the work-around we have in place now,
How would you implement it?
<history> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files. I had the following options: 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor 2) same as 1) but with multiple copies pooled 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process
I chose the third option because it was possible to introduce it without exploring and rewriting existing users: you could leave all code as-is and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code that needed better performance. It's obviously not a perfect solution, but I still think it was the best available at the time.
Later ProcessLocalVariables were added to Trunk. Which could be used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after they are not needed any more. Perhaps a high priority process could do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems.
</history>
I think the key issue (& this from a discussion here with Bert) is access time source in the debugger while one is debugging file access. As the debugger asks for source so the file pointer is moved and hence screws up the access one is trying to debug.
I don't think that's the only issue. Have a look at the senders of #readOnlyCopy. Many of them were added 10+ years ago, well before CurrentReadOnlySourceFiles was introduced. Most of those could use CurrentReadOnlySourceFiles too but are unrelated to the debugger.
So I would provide something like SourceFiles withSubstituteCopiesWhile: aBlock which would install either copies of the files or read-only copies of the files for the duration of the block, and have the debugger use the method around its access to method source.
The IDE is essentially single threaded as far as code modification goes, even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method dictionaries, and that limitation is fine in practice. So having the same restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source update to the changes file could get corrupted too).
So I think not using read-only copies to read source, and having the debugger use copies for its access would be a good lightweight solution.
I agree with what you wrote about method changes, but reading the sources concurrently is still a possibility, especially when multiple UI processes can exist at the same time (e.g. that's what opening a debugger does, right?).
Levente
Levente
Eliot
On 2020-03-31, at 3:31 PM, Levente Uzonyi leves@caesar.elte.hu wrote: Have a look at the senders of #readOnlyCopy. Many of them were added 10+ years ago,
IIRC that was added almost at the very beginning of Squeak. I had to do an entire fake-filing-system VM layer to allow RISC OS to cope with the very weird seeming idea of allowing multiple openings of the same file. It certainly wasn't a thing inherited from Smalltalk-80v2 - I would have had to do sometihng similar in '87 if it were.
I agree with what you wrote about method changes, but reading the sources concurrently is still a possibility, especially when multiple UI processes can exist at the same time (e.g. that's what opening a debugger does, right?).
We should have the sources file opened once for reading and the changes file opened once for read/write. We should handle everything else by providing redirected access from streams of whatever sort and a a sensible set of primitives.
A read should be atomic and specified with the starting point, the number of bytes to read and where to store them; it should return the number of bytes read or an error code. A write should be atomic and specify the start point, the number to write and the place to read from; it should likewise return the number written or an error code.
That way having a bunch of streams connecting will work safely; you read from one and it cannot be mangled by another action moving the file position in the midst of the read/write.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim You never finish a program, you just stop working on it.
Hi Levente,
On Tue, Mar 31, 2020 at 3:33 PM Levente Uzonyi leves@caesar.elte.hu wrote:
Hi Eliot,
On Mon, 30 Mar 2020, Eliot Miranda wrote:
Hi Levente,
On Mar 30, 2020, at 2:21 PM, Levente Uzonyi leves@caesar.elte.hu
wrote:
Hi Eliot,
On Mon, 30 Mar 2020, Eliot Miranda wrote:
Well, that's not what I meant by a search. However, as Levente
pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:. I think this is an awful implementation and would implement it
very differently but that's the work-around we have in place now,
How would you implement it?
<history> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the
issue of concurrent access to the source files.
I had the following options:
- controlled access to a shared resource (a single read-only copy of
the SourceFiles array) with e.g. a Monitor
- same as 1) but with multiple copies pooled
- exceptions to define the scope and lifetime of the resources (the
read-only copies) within a process
I chose the third option because it was possible to introduce it
without exploring and rewriting existing users: you could leave all code as-is
and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code
that needed better performance.
It's obviously not a perfect solution, but I still think it was the
best available at the time.
Later ProcessLocalVariables were added to Trunk. Which could be used to
solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after they are not needed any more. Perhaps a high priority process could do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems.
</history>
I think the key issue (& this from a discussion here with Bert) is
access time source in the debugger while one is debugging file access. As the debugger asks for source so the file pointer is moved and hence screws up the access one is trying to debug.
I don't think that's the only issue. Have a look at the senders of #readOnlyCopy. Many of them were added 10+ years ago, well before CurrentReadOnlySourceFiles was introduced. Most of those could use CurrentReadOnlySourceFiles too but are unrelated to the debugger.
Yes, but IIRC that issue was to separate the writable file from the read-only file. I remember dealing with this when working on Newspeak in 2007/2008. So SourceFiles can easily maintain a writable file and a read-only copy of the file for both sources and changes and do writes through the writable one.
So I would provide something like SourceFiles withSubstituteCopiesWhile: aBlock which would install either copies of the files or read-only copies of
the files for the duration of the block, and have the debugger use the method around its access to method source.
The IDE is essentially single threaded as far as code modification goes,
even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method dictionaries, and that limitation is fine in practice. So having the same restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source update to the changes file could get corrupted too).
So I think not using read-only copies to read source, and having the
debugger use copies for its access would be a good lightweight solution.
I agree with what you wrote about method changes, but reading the sources concurrently is still a possibility, especially when multiple UI processes can exist at the same time (e.g. that's what opening a debugger does, right?).
My assertion is that the IDE is essentially single0-threaded and this doesn't;t have to be supported. In any case, concurrent access will work if processes of the same priority level are cooperating. But I just answered the debugger issue. I'm assuming that the debugger guards all its source access by substituting a different file. So it, and only it, accesses the sources files through copies, and it, and only it pays the cost for substituting the copies. Normal queries can use a single read-only copy. That gives us the functionality of cacheDuring: without having to invoke it.
So let me reiterate.
SourceFiles is modified to have a single writable version of the changes file and a single read-only version of sources nd changes files. Source code is read through the readable copy and new source written through the writable copy. Whenever the debugger accesses source it does so through a method that first saves the files, substitutes copies in SourcesFiles, evaluates its block (that will access source through the copies), and then ensures that the original files are restored. There can be error checking for writing to the changes file in the debugger while writes are in progress to the original writable changes file, although I'm not sure this is necessary; folks debugging source file access usually know what they're doing.
The result is that - normal source reading does not require creating a read-only copy; it already exists. - the debugger does not interfere with source access because it is careful to use copies and leave the originals undisturbed - CurrentReadOnlySourceFiles and cacheDuring: can be discarded
Levente
Levente
Eliot
Hi Eliot,
On Tue, 31 Mar 2020, Eliot Miranda wrote:
Hi Levente,
On Tue, Mar 31, 2020 at 3:33 PM Levente Uzonyi leves@caesar.elte.hu wrote: Hi Eliot,
On Mon, 30 Mar 2020, Eliot Miranda wrote: > Hi Levente, > >> On Mar 30, 2020, at 2:21 PM, Levente Uzonyi <leves@caesar.elte.hu> wrote: >> >> Hi Eliot, >> >>> On Mon, 30 Mar 2020, Eliot Miranda wrote: >>> >>> Well, that's not what I meant by a search. However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:. I think this is an awful implementation and would implement it >>> very differently but that's the work-around we have in place now, >> >> How would you implement it? >> >> <history> >> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files. >> I had the following options: >> 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor >> 2) same as 1) but with multiple copies pooled >> 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process >> >> I chose the third option because it was possible to introduce it without exploring and rewriting existing users: you could leave all code as-is >> and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code that needed better performance. >> It's obviously not a perfect solution, but I still think it was the best available at the time. >> >> Later ProcessLocalVariables were added to Trunk. Which could be used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after they are not needed any more. Perhaps a high priority process could do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems. >> </history> > > I think the key issue (& this from a discussion here with Bert) is access time source in the debugger while one is debugging file access. As the debugger asks for source so the file pointer is moved and hence screws up the access one is trying to debug. I don't think that's the only issue. Have a look at the senders of #readOnlyCopy. Many of them were added 10+ years ago, well before CurrentReadOnlySourceFiles was introduced. Most of those could use CurrentReadOnlySourceFiles too but are unrelated to the debugger.
Yes, but IIRC that issue was to separate the writable file from the read-only file. I remember dealing with this when working on Newspeak in 2007/2008. So SourceFiles can easily maintain a writable file and a read-only copy of the file for both sources and changes and do writes through the writable one.
> > So I would provide something like > SourceFiles withSubstituteCopiesWhile: aBlock > which would install either copies of the files or read-only copies of the files for the duration of the block, and have the debugger use the method around its access to method source. > > The IDE is essentially single threaded as far as code modification goes, even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method dictionaries, and that limitation is fine in practice. So having the same restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source update to the changes file could get corrupted too). > > So I think not using read-only copies to read source, and having the debugger use copies for its access would be a good lightweight solution. I agree with what you wrote about method changes, but reading the sources concurrently is still a possibility, especially when multiple UI processes can exist at the same time (e.g. that's what opening a debugger does, right?).
My assertion is that the IDE is essentially single0-threaded and this doesn't;t have to be supported. In any case, concurrent access will work if processes of the same priority level are cooperating. But I just answered the debugger issue. I'm assuming that the debugger guards all its source access by substituting a different file. So it, and only it, accesses the sources files through copies, and it, and only it pays the cost for substituting the copies. Normal queries can use a single read-only copy. That gives us the functionality of cacheDuring: without having to invoke it.
The IDE is single-threaded but source files may be read outside the context of the IDE.
So let me reiterate.
SourceFiles is modified to have a single writable version of the changes file and a single read-only version of sources nd changes files. Source code is read through the readable copy and new source written through the writable copy. Whenever the debugger accesses source it does so through a method that first saves the files, substitutes copies in SourcesFiles, evaluates its block (that will access source through the copies), and then ensures that the original files are restored. There can be error checking for writing to the changes file in the debugger while writes are in progress to the original writable changes file, although I'm not sure this is necessary; folks debugging source file access usually know what they're doing.
The result is that
- normal source reading does not require creating a read-only copy; it already exists.
Do you mean that the existence of #readOnlyCopy is satisfactory?. Creating a copy for every single file access is painfully slow. CurrentReadOnlySourceFiles only exists to remedy that by reusing the same read-only copy.
- the debugger does not interfere with source access because it is careful to use copies and leave the originals undisturbed
That's exactly what I tried to imply by stating there being no problem with the debugger before the mass use of read-only copies were introduced.
Levente
CurrentReadOnlySourceFiles and cacheDuring: can be discarded
Levente > >> Levente > > Eliot
-- _,,,^..^,,,_ best, Eliot
Hi Levente,
On Tue, Mar 31, 2020 at 7:38 PM Levente Uzonyi leves@caesar.elte.hu wrote:
Hi Eliot,
On Tue, 31 Mar 2020, Eliot Miranda wrote:
Hi Levente,
On Tue, Mar 31, 2020 at 3:33 PM Levente Uzonyi leves@caesar.elte.hu
wrote:
Hi Eliot, On Mon, 30 Mar 2020, Eliot Miranda wrote: > Hi Levente, > >> On Mar 30, 2020, at 2:21 PM, Levente Uzonyi <
leves@caesar.elte.hu> wrote:
>> >> Hi Eliot, >> >>> On Mon, 30 Mar 2020, Eliot Miranda wrote: >>> >>> Well, that's not what I meant by a search. However, as
Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:. I think this is an awful implementation and would
implement it >>> very differently but that's the work-around we have in place
now,
>> >> How would you implement it? >> >> <history> >> When I introduced CurrentReadOnlySourceFiles, I wanted to solve
the issue of concurrent access to the source files.
>> I had the following options: >> 1) controlled access to a shared resource (a single read-only
copy of the SourceFiles array) with e.g. a Monitor
>> 2) same as 1) but with multiple copies pooled >> 3) exceptions to define the scope and lifetime of the resources
(the read-only copies) within a process
>> >> I chose the third option because it was possible to introduce
it without exploring and rewriting existing users: you could leave all code as-is
>> and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ]
around code that needed better performance.
>> It's obviously not a perfect solution, but I still think it was
the best available at the time.
>> >> Later ProcessLocalVariables were added to Trunk. Which could be
used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after they are
not needed any more. Perhaps a high priority process could do that
after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems.
>> </history> > > I think the key issue (& this from a discussion here with Bert)
is access time source in the debugger while one is debugging file access. As the debugger asks for source so the file pointer is moved and hence
screws up the access one is trying to debug. I don't think that's the only issue. Have a look at the senders of #readOnlyCopy. Many of them were added 10+ years ago, well before CurrentReadOnlySourceFiles was introduced. Most of those could use CurrentReadOnlySourceFiles too but are unrelated to the debugger.
Yes, but IIRC that issue was to separate the writable file from the
read-only file. I remember dealing with this when working on Newspeak in 2007/2008. So SourceFiles can easily maintain a writable file and a read-only copy
of the file for both sources and changes and do writes through the
writable one.
> > So I would provide something like > SourceFiles withSubstituteCopiesWhile: aBlock > which would install either copies of the files or read-only
copies of the files for the duration of the block, and have the debugger use the method around its access to method source.
> > The IDE is essentially single threaded as far as code
modification goes, even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method dictionaries,
and that limitation is fine in practice. So having the same
restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source update to
the changes file could get corrupted too). > > So I think not using read-only copies to read source, and having
the debugger use copies for its access would be a good lightweight solution.
I agree with what you wrote about method changes, but reading the
sources
concurrently is still a possibility, especially when multiple UI
processes
can exist at the same time (e.g. that's what opening a debugger
does,
right?).
My assertion is that the IDE is essentially single0-threaded and this
doesn't;t have to be supported. In any case, concurrent access will work if processes of the same priority level are cooperating. But I just answered the
debugger issue. I'm assuming that the debugger guards all its source
access by substituting a different file. So it, and only it, accesses the sources files through copies, and it, and only it pays the cost for substituting
the copies. Normal queries can use a single read-only copy. That gives
us the functionality of cacheDuring: without having to invoke it.
The IDE is single-threaded but source files may be read outside the context of the IDE.
Can you give me a for instance. I simply don't believe you. And even its it's true I don't see that it has to be supported. Please don't be vague. This is important.
So let me reiterate.
SourceFiles is modified to have a single writable version of the changes
file and a single read-only version of sources nd changes files. Source code is read through the readable copy and new source written through the
writable copy. Whenever the debugger accesses source it does so through
a method that first saves the files, substitutes copies in SourcesFiles, evaluates its block (that will access source through the copies), and then
ensures that the original files are restored. There can be error
checking for writing to the changes file in the debugger while writes are in progress to the original writable changes file, although I'm not sure this is
necessary; folks debugging source file access usually know what they're
doing.
The result is that
- normal source reading does not require creating a read-only copy; it
already exists.
Do you mean that the existence of #readOnlyCopy is satisfactory?.
Yes.
Creating a copy for every single file access is painfully slow.
Exactly.
CurrentReadOnlySourceFiles only exists to remedy that by reusing the same read-only copy.
I feel like you're not understanding my proposal. Apologies if I'm presuming. With my proposal the only time a new copy is created is when the debugger wants to access the source of a method. That happens on the order of seconds, not microseconds as happens when scanning for source.
- the debugger does not interfere with source access because it is
careful to use copies and leave the originals undisturbed
That's exactly what I tried to imply by stating there being no problem with the debugger before the mass use of read-only copies were introduced.
Can you not see that in any scheme there is the potential for chaos if the debugger is accessing the source as one steps through code of methods that themselves are in the process of accessing source? And so it is key that the debugger not* perturb the system when it itself accesses source?
I'm confused. We seem to be talking past each other. I feel like you're blocking a reasonable proposal but I don't really understand what your objections are. I apologize. I'm not trying to be confrontational, but I do think my proposal is important and has merit and I feel frustrated by you because I can't quite understand why you're against it. If you can identify a serious flaw I'll happily abandon it. But I need to understand the flaw first.
Levente
CurrentReadOnlySourceFiles and cacheDuring: can be discarded
Levente > >> Levente > > Eliot
-- _,,,^..^,,,_ best, Eliot
Hi Eliot,
On Tue, 31 Mar 2020, Eliot Miranda wrote:
Hi Levente,
On Tue, Mar 31, 2020 at 7:38 PM Levente Uzonyi leves@caesar.elte.hu wrote: Hi Eliot,
On Tue, 31 Mar 2020, Eliot Miranda wrote: > Hi Levente, > > On Tue, Mar 31, 2020 at 3:33 PM Levente Uzonyi <leves@caesar.elte.hu> wrote: > Hi Eliot, > > On Mon, 30 Mar 2020, Eliot Miranda wrote: > > > Hi Levente, > > > >> On Mar 30, 2020, at 2:21 PM, Levente Uzonyi <leves@caesar.elte.hu> wrote: > >> > >> Hi Eliot, > >> > >>> On Mon, 30 Mar 2020, Eliot Miranda wrote: > >>> > >>> Well, that's not what I meant by a search. However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:. I think this is an awful implementation and would > implement it > >>> very differently but that's the work-around we have in place now, > >> > >> How would you implement it? > >> > >> <history> > >> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files. > >> I had the following options: > >> 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor > >> 2) same as 1) but with multiple copies pooled > >> 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process > >> > >> I chose the third option because it was possible to introduce it without exploring and rewriting existing users: you could leave all code as-is > >> and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code that needed better performance. > >> It's obviously not a perfect solution, but I still think it was the best available at the time. > >> > >> Later ProcessLocalVariables were added to Trunk. Which could be used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after they are > not needed any more. Perhaps a high priority process could do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems. > >> </history> > > > > I think the key issue (& this from a discussion here with Bert) is access time source in the debugger while one is debugging file access. As the debugger asks for source so the file pointer is moved and hence > screws up the access one is trying to debug. > > I don't think that's the only issue. Have a look at the senders of > #readOnlyCopy. Many of them were added 10+ years ago, well before > CurrentReadOnlySourceFiles was introduced. Most of those could use > CurrentReadOnlySourceFiles too but are unrelated to the debugger. > > > Yes, but IIRC that issue was to separate the writable file from the read-only file. I remember dealing with this when working on Newspeak in 2007/2008. So SourceFiles can easily maintain a writable file and a read-only copy > of the file for both sources and changes and do writes through the writable one. > > > > > > So I would provide something like > > SourceFiles withSubstituteCopiesWhile: aBlock > > which would install either copies of the files or read-only copies of the files for the duration of the block, and have the debugger use the method around its access to method source. > > > > The IDE is essentially single threaded as far as code modification goes, even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method dictionaries, > and that limitation is fine in practice. So having the same restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source update to > the changes file could get corrupted too). > > > > So I think not using read-only copies to read source, and having the debugger use copies for its access would be a good lightweight solution. > > I agree with what you wrote about method changes, but reading the sources > concurrently is still a possibility, especially when multiple UI processes > can exist at the same time (e.g. that's what opening a debugger does, > right?). > > > My assertion is that the IDE is essentially single0-threaded and this doesn't;t have to be supported. In any case, concurrent access will work if processes of the same priority level are cooperating. But I just answered the > debugger issue. I'm assuming that the debugger guards all its source access by substituting a different file. So it, and only it, accesses the sources files through copies, and it, and only it pays the cost for substituting > the copies. Normal queries can use a single read-only copy. That gives us the functionality of cacheDuring: without having to invoke it. The IDE is single-threaded but source files may be read outside the context of the IDE.
Can you give me a for instance. I simply don't believe you. And even its it's true I don't see that it has to be supported. Please don't be vague. This is important.
For example, Seaside has a web-based code browser. The webserver, no matter which one is used by Seaside, will read the code from a process different than the UI process.
> > > So let me reiterate. > > SourceFiles is modified to have a single writable version of the changes file and a single read-only version of sources nd changes files. Source code is read through the readable copy and new source written through the > writable copy. Whenever the debugger accesses source it does so through a method that first saves the files, substitutes copies in SourcesFiles, evaluates its block (that will access source through the copies), and then > ensures that the original files are restored. There can be error checking for writing to the changes file in the debugger while writes are in progress to the original writable changes file, although I'm not sure this is > necessary; folks debugging source file access usually know what they're doing. > > The result is that > - normal source reading does not require creating a read-only copy; it already exists.
Do you mean that the existence of #readOnlyCopy is satisfactory?.
Yes. Creating a copy for every single file access is painfully slow.
Exactly. CurrentReadOnlySourceFiles only exists to remedy that by reusing the same read-only copy.
I feel like you're not understanding my proposal. Apologies if I'm presuming. With my proposal the only time a new copy is created is when the debugger wants to access the source of a method. That happens on the order of seconds, not microseconds as happens when scanning for source.
You say that each tool should use the same shared file streams. If so, that implies that all sends of #readOnlyCopy have unnecessarily been added over the years except for those which are related to the debugger. And if I were to remove them along with CurrentReadOnlySourceFiles, everything would stay normal, right?
> - the debugger does not interfere with source access because it is careful to use copies and leave the originals undisturbed That's exactly what I tried to imply by stating there being no problem with the debugger before the mass use of read-only copies were introduced.
Can you not see that in any scheme there is the potential for chaos if the debugger is accessing the source as one steps through code of methods that themselves are in the process of accessing source? And so it is key that the debugger not* perturb the system when it itself accesses source?
I'm confused. We seem to be talking past each other. I feel like you're blocking a reasonable proposal but I don't really understand what your objections are. I apologize. I'm not trying to be confrontational, but I do think my proposal is important and has merit and I feel frustrated by you because I can't quite understand why you're against it. If you can identify a serious flaw I'll happily abandon it. But I need to understand the flaw first.
I think that we have different ideas about what the problem is:
1) You say that the debugger is a key source of problems right now. I say that I'm not aware of issues with the debugger. I know what the potential problem could be, but I don't know whether it exists or not right now because I have not seen any issues with the debugger lately. If you have a reproducible case, please share it here.
2) You say that the source files can safely be treated as only-to-be-used-by-the-IDE. I say that's not the case. IMO, I should be able to fork a process scanning the sources and do something else while it's processing the code. I also think that external tools like Seaside should be able to read the source files without messing up the image.
Did I understand you correctly based on these two points above?
Levente
Levente > - CurrentReadOnlySourceFiles and cacheDuring: can be discarded > > > > > > Levente > > > > >> Levente > > > > Eliot > > > > -- > _,,,^..^,,,_ > best, Eliot > >
-- _,,,^..^,,,_ best, Eliot
Hi Levente,
On Tue, Mar 31, 2020 at 9:25 PM Levente Uzonyi leves@caesar.elte.hu wrote:
Hi Eliot,
On Tue, 31 Mar 2020, Eliot Miranda wrote:
Hi Levente,
On Tue, Mar 31, 2020 at 7:38 PM Levente Uzonyi leves@caesar.elte.hu
wrote:
Hi Eliot, On Tue, 31 Mar 2020, Eliot Miranda wrote: > Hi Levente, > > On Tue, Mar 31, 2020 at 3:33 PM Levente Uzonyi <
leves@caesar.elte.hu> wrote:
> Hi Eliot, > > On Mon, 30 Mar 2020, Eliot Miranda wrote: > > > Hi Levente, > > > >> On Mar 30, 2020, at 2:21 PM, Levente Uzonyi <
leves@caesar.elte.hu> wrote:
> >> > >> Hi Eliot, > >> > >>> On Mon, 30 Mar 2020, Eliot Miranda wrote: > >>> > >>> Well, that's not what I meant by a search. However,
as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:. I think this is an awful implementation
and would > implement it > >>> very differently but that's the work-around we have in
place now,
> >> > >> How would you implement it? > >> > >> <history> > >> When I introduced CurrentReadOnlySourceFiles, I wanted
to solve the issue of concurrent access to the source files.
> >> I had the following options: > >> 1) controlled access to a shared resource (a single
read-only copy of the SourceFiles array) with e.g. a Monitor
> >> 2) same as 1) but with multiple copies pooled > >> 3) exceptions to define the scope and lifetime of the
resources (the read-only copies) within a process
> >> > >> I chose the third option because it was possible to
introduce it without exploring and rewriting existing users: you could leave all code as-is
> >> and sprinke CurrentReadOnlySourceFiles cacheDuring: [
... ] around code that needed better performance.
> >> It's obviously not a perfect solution, but I still
think it was the best available at the time.
> >> > >> Later ProcessLocalVariables were added to Trunk. Which
could be used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after
they are > not needed any more. Perhaps a high priority process could
do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems.
> >> </history> > > > > I think the key issue (& this from a discussion here
with Bert) is access time source in the debugger while one is debugging file access. As the debugger asks for source so the file pointer is moved and
hence > screws up the access one is trying to debug. > > I don't think that's the only issue. Have a look at the
senders of
> #readOnlyCopy. Many of them were added 10+ years ago, well
before
> CurrentReadOnlySourceFiles was introduced. Most of those
could use
> CurrentReadOnlySourceFiles too but are unrelated to the
debugger.
> > > Yes, but IIRC that issue was to separate the writable file from
the read-only file. I remember dealing with this when working on Newspeak in 2007/2008. So SourceFiles can easily maintain a writable file and a
read-only copy > of the file for both sources and changes and do writes through
the writable one.
> > > > > > So I would provide something like > > SourceFiles withSubstituteCopiesWhile: aBlock > > which would install either copies of the files or
read-only copies of the files for the duration of the block, and have the debugger use the method around its access to method source.
> > > > The IDE is essentially single threaded as far as code
modification goes, even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method
dictionaries, > and that limitation is fine in practice. So having the
same restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source
update to > the changes file could get corrupted too). > > > > So I think not using read-only copies to read source,
and having the debugger use copies for its access would be a good lightweight solution.
> > I agree with what you wrote about method changes, but
reading the sources
> concurrently is still a possibility, especially when
multiple UI processes
> can exist at the same time (e.g. that's what opening a
debugger does,
> right?). > > > My assertion is that the IDE is essentially single0-threaded and
this doesn't;t have to be supported. In any case, concurrent access will work if processes of the same priority level are cooperating. But I
just answered the > debugger issue. I'm assuming that the debugger guards all its
source access by substituting a different file. So it, and only it, accesses the sources files through copies, and it, and only it pays the cost
for substituting > the copies. Normal queries can use a single read-only copy.
That gives us the functionality of cacheDuring: without having to invoke it.
The IDE is single-threaded but source files may be read outside the context of the IDE.
Can you give me a for instance. I simply don't believe you. And even
its it's true I don't see that it has to be supported. Please don't be vague. This is important.
For example, Seaside has a web-based code browser. The webserver, no matter which one is used by Seaside, will read the code from a process different than the UI process.
Yes, but there's still no implication that source access should be thread-safe. The Seaside access to the IDE is still happening in the context of a cooperatively threaded Smalltalk, and there are places in Seaside where access to the IDE could be serialized without relying on support for thread safe source access when there is no thread-safe access to adding/removing methods. So for the Seaside browser to function properly synchronization needs to be added to the general interface between Seaside and the IDE, not just source access. For example, if we had a Seaside Squeak IDE server that allowed sharing between multiple programmers I suggest that the right way to serialize access is to provide some kind of synchronized queue between Seaside and the IDE, not to try and make the IDE thread-safe. Updating things like class definitions, which potentially imply recompiling all methods in a class hierarchy require that no other modifications to the class hierarchy are occrring while a class and its subclasses are being redefined.
> So let me reiterate.
> > SourceFiles is modified to have a single writable version of the
changes file and a single read-only version of sources nd changes files. Source code is read through the readable copy and new source written
through the > writable copy. Whenever the debugger accesses source it does so
through a method that first saves the files, substitutes copies in SourcesFiles, evaluates its block (that will access source through the copies),
and then > ensures that the original files are restored. There can be
error checking for writing to the changes file in the debugger while writes are in progress to the original writable changes file, although I'm not
sure this is > necessary; folks debugging source file access usually know what
they're doing.
> > The result is that > - normal source reading does not require creating a read-only
copy; it already exists.
Do you mean that the existence of #readOnlyCopy is satisfactory?.
Yes.
Creating a copy for every single file access is painfully slow.
Exactly.
CurrentReadOnlySourceFiles only exists to remedy that by reusing
the same
read-only copy.
I feel like you're not understanding my proposal. Apologies if I'm
presuming. With my proposal the only time a new copy is created is when the debugger wants to access the source of a method. That happens on the order of
seconds, not microseconds as happens when scanning for source.
You say that each tool should use the same shared file streams. If so, that implies that all sends of #readOnlyCopy have unnecessarily been added over the years except for those which are related to the debugger.
That's my opinion yes.
And if I were to remove them along with CurrentReadOnlySourceFiles, everything would stay normal, right?
I think so. Provided the debugger accesses source carefully, everything should be OK. It's certainly worth an experiment, right?
> - the debugger does not interfere with source access because it
is careful to use copies and leave the originals undisturbed
That's exactly what I tried to imply by stating there being no
problem
with the debugger before the mass use of read-only copies were
introduced.
Can you not see that in any scheme there is the potential for chaos if
the debugger is accessing the source as one steps through code of methods that themselves are in the process of accessing source? And so it is key that
the debugger not* perturb the system when it itself accesses source?
I'm confused. We seem to be talking past each other. I feel like
you're blocking a reasonable proposal but I don't really understand what your objections are. I apologize. I'm not trying to be confrontational, but I do
think my proposal is important and has merit and I feel frustrated by
you because I can't quite understand why you're against it. If you can identify a serious flaw I'll happily abandon it. But I need to understand the flaw
first.
I think that we have different ideas about what the problem is:
- You say that the debugger is a key source of problems right now.
I say that I'm not aware of issues with the debugger. I know what the potential problem could be, but I don't know whether it exists or not right now because I have not seen any issues with the debugger lately. If you have a reproducible case, please share it here.
No I'm not saying that the debugger is a source of problems. I'm saying that if one wants to debug source access then w.r.t. source access the debugger must not interfere. Since the debugger necessarily accesses source as it is used, it must not perturb source access while it is being used to debug source access.
Let me be clear. Let's say we want to step through
(Object >> #at:) timeStamp
as we stepped through the debugger would access several methods in CompiledMethod and FileStream. If we only had one read-only copy for the source file then the potential exists for the source pointer of the read-only copy to be changed after it has been set to point at the chunk for Object>>#at:, and hence get the wrong answer, maybe answering the timeStamp for CompiledMethod>>preamble or some such. So every time the debugger accesses source it must be careful not to disturb the single read-only copy of the source file. It can easily do this by using a private copy of the source file.
2) You say that the source files can safely be treated as
only-to-be-used-by-the-IDE. I say that's not the case. IMO, I should be able to fork a process scanning the sources and do something else while it's processing the code. I also think that external tools like Seaside should be able to read the source files without messing up the image.
Have you ever done this is practice?
Did I understand you correctly based on these two points above?
First, no. Second, yes. Is my clarification adequate?
Levente
Levente > - CurrentReadOnlySourceFiles and cacheDuring: can be discarded > > > > > > Levente > > > > >> Levente > > > > Eliot > > > > -- > _,,,^..^,,,_ > best, Eliot > >
-- _,,,^..^,,,_ best, Eliot
Hi Eliot,
On Wed, 1 Apr 2020, Eliot Miranda wrote:
Hi Levente,
On Tue, Mar 31, 2020 at 9:25 PM Levente Uzonyi leves@caesar.elte.hu wrote: Hi Eliot,
On Tue, 31 Mar 2020, Eliot Miranda wrote: > Hi Levente, > > On Tue, Mar 31, 2020 at 7:38 PM Levente Uzonyi <leves@caesar.elte.hu> wrote: > Hi Eliot, > > On Tue, 31 Mar 2020, Eliot Miranda wrote: > > > Hi Levente, > > > > On Tue, Mar 31, 2020 at 3:33 PM Levente Uzonyi <leves@caesar.elte.hu> wrote: > > Hi Eliot, > > > > On Mon, 30 Mar 2020, Eliot Miranda wrote: > > > > > Hi Levente, > > > > > >> On Mar 30, 2020, at 2:21 PM, Levente Uzonyi <leves@caesar.elte.hu> wrote: > > >> > > >> Hi Eliot, > > >> > > >>> On Mon, 30 Mar 2020, Eliot Miranda wrote: > > >>> > > >>> Well, that's not what I meant by a search. However, as Levente pointed out, textual searches should be surrounded with CurrentReadOlySouceFiles cacheDuring:. I think this is an awful implementation > and would > > implement it > > >>> very differently but that's the work-around we have in place now, > > >> > > >> How would you implement it? > > >> > > >> <history> > > >> When I introduced CurrentReadOnlySourceFiles, I wanted to solve the issue of concurrent access to the source files. > > >> I had the following options: > > >> 1) controlled access to a shared resource (a single read-only copy of the SourceFiles array) with e.g. a Monitor > > >> 2) same as 1) but with multiple copies pooled > > >> 3) exceptions to define the scope and lifetime of the resources (the read-only copies) within a process > > >> > > >> I chose the third option because it was possible to introduce it without exploring and rewriting existing users: you could leave all code as-is > > >> and sprinke CurrentReadOnlySourceFiles cacheDuring: [ ... ] around code that needed better performance. > > >> It's obviously not a perfect solution, but I still think it was the best available at the time. > > >> > > >> Later ProcessLocalVariables were added to Trunk. Which could be used to solve the concurrent access issue by using process-local copies of the source files. The only challenge is to release them after > they are > > not needed any more. Perhaps a high priority process could do that after a few minutes of inactivity. Or we could just let them linger and see if they cause any problems. > > >> </history> > > > > > > I think the key issue (& this from a discussion here with Bert) is access time source in the debugger while one is debugging file access. As the debugger asks for source so the file pointer is moved and > hence > > screws up the access one is trying to debug. > > > > I don't think that's the only issue. Have a look at the senders of > > #readOnlyCopy. Many of them were added 10+ years ago, well before > > CurrentReadOnlySourceFiles was introduced. Most of those could use > > CurrentReadOnlySourceFiles too but are unrelated to the debugger. > > > > > > Yes, but IIRC that issue was to separate the writable file from the read-only file. I remember dealing with this when working on Newspeak in 2007/2008. So SourceFiles can easily maintain a writable file and a > read-only copy > > of the file for both sources and changes and do writes through the writable one. > > > > > > > > > > So I would provide something like > > > SourceFiles withSubstituteCopiesWhile: aBlock > > > which would install either copies of the files or read-only copies of the files for the duration of the block, and have the debugger use the method around its access to method source. > > > > > > The IDE is essentially single threaded as far as code modification goes, even if this isn’t enforced. There is no serialization on adding/removing methods and concurrent access can corrupt method > dictionaries, > > and that limitation is fine in practice. So having the same restriction on source file access is fine too (and in fact I think the restriction already applies; if one were to fork compiles then source > update to > > the changes file could get corrupted too). > > > > > > So I think not using read-only copies to read source, and having the debugger use copies for its access would be a good lightweight solution. > > > > I agree with what you wrote about method changes, but reading the sources > > concurrently is still a possibility, especially when multiple UI processes > > can exist at the same time (e.g. that's what opening a debugger does, > > right?). > > > > > > My assertion is that the IDE is essentially single0-threaded and this doesn't;t have to be supported. In any case, concurrent access will work if processes of the same priority level are cooperating. But I > just answered the > > debugger issue. I'm assuming that the debugger guards all its source access by substituting a different file. So it, and only it, accesses the sources files through copies, and it, and only it pays the cost > for substituting > > the copies. Normal queries can use a single read-only copy. That gives us the functionality of cacheDuring: without having to invoke it. > > The IDE is single-threaded but source files may be read outside the > context of the IDE. > > > Can you give me a for instance. I simply don't believe you. And even its it's true I don't see that it has to be supported. Please don't be vague. This is important. For example, Seaside has a web-based code browser. The webserver, no matter which one is used by Seaside, will read the code from a process different than the UI process.
Yes, but there's still no implication that source access should be thread-safe. The Seaside access to the IDE is still happening in the context of a cooperatively threaded Smalltalk, and there are places in Seaside where access to the IDE could be serialized without relying on support for thread safe source access when there is no thread-safe access to adding/removing methods. So for the Seaside browser to function properly synchronization needs to be added to the general interface between Seaside and the IDE, not just source access. For example, if we had a Seaside Squeak IDE server that allowed sharing between multiple programmers I suggest that the right way to serialize access is to provide some kind of synchronized queue between Seaside and the IDE, not to try and make the IDE thread-safe. Updating things like class definitions, which potentially imply recompiling all methods in a class hierarchy require that no other modifications to the class hierarchy are occrring while a class and its subclasses are being redefined.
Even though it's extremely unlikely, browsing code (not modifying, just viewing) with the Seaside Code Browser has a potential to mess up the source files in the image if source access is not thread-safe.
> > So let me reiterate. > > > > SourceFiles is modified to have a single writable version of the changes file and a single read-only version of sources nd changes files. Source code is read through the readable copy and new source written > through the > > writable copy. Whenever the debugger accesses source it does so through a method that first saves the files, substitutes copies in SourcesFiles, evaluates its block (that will access source through the copies), > and then > > ensures that the original files are restored. There can be error checking for writing to the changes file in the debugger while writes are in progress to the original writable changes file, although I'm not > sure this is > > necessary; folks debugging source file access usually know what they're doing. > > > > The result is that > > - normal source reading does not require creating a read-only copy; it already exists. > > Do you mean that the existence of #readOnlyCopy is satisfactory?. > > > Yes. > > Creating a copy for every single file access is painfully slow. > > > Exactly. > > CurrentReadOnlySourceFiles only exists to remedy that by reusing the same > read-only copy. > > > I feel like you're not understanding my proposal. Apologies if I'm presuming. With my proposal the only time a new copy is created is when the debugger wants to access the source of a method. That happens on the order of > seconds, not microseconds as happens when scanning for source. You say that each tool should use the same shared file streams. If so, that implies that all sends of #readOnlyCopy have unnecessarily been added over the years except for those which are related to the debugger.
That's my opinion yes. And if I were to remove them along with CurrentReadOnlySourceFiles, everything would stay normal, right?
I think so. Provided the debugger accesses source carefully, everything should be OK. It's certainly worth an experiment, right?
I gave it a try and removed all references to CurrentReadOnlySourceFiles with a global copy of SourceFiles: ReadOnlySourceFiles in my image. Nothing broken so far, but the system is extremely fragile: - saving the image with a different name will leave the read-only copies intact - starting up the image will use old file references - even if the debugger worked properly, opening an image with debuggers left open on code related to source file handling will cause problems
Of course, those problems can be tackled, but the conclusion is that it's probably not a good idea to use a separate global for the read-only source files. The best seems to be to hold them in the SourceFiles object (a SourceFileArray).
I did a quick search in the mail archives, and found this post of yours on the Pharo list: http://forum.world.st/From-a-mooc-user-method-source-with-it-take-so-long-in...
You suggested introducing a new method:
SourceFiles substituteFreshReadOnlyCopiesDuring: [...file access...]
If the file access is restricted to the block, and the block doesn't access global state related to the source files, then it's possible to make the read-only file access thread-safe with no additional cost: - hold the read-only copies in a new instance variable of SourceFileArray (e.g. readOnlyFiles) - only pass the read-only copies to the block, don't expose them globally - before the block is evaluated, save readOnlyFiles in a temporary, nil out the readOnlyFiles variable, pass the temporary to the block - after the block has been evaluated, check whether readOnlyFiles is nil. If it's not nil, close and throw away the files passed to the block. If it's nil, check whether they still point to the same files as the actual source files. If yes, save them in the readOnlyFiles variable, else close and throw away. That way there's a globally shared read-only copy of the source files that is guaranteed to be used only by one process at a time. If two or more processes need the read only copies at the same time, new read only copies will be created ensuring thread-safety.
> > - the debugger does not interfere with source access because it is careful to use copies and leave the originals undisturbed > > That's exactly what I tried to imply by stating there being no problem > with the debugger before the mass use of read-only copies were introduced. > > > Can you not see that in any scheme there is the potential for chaos if the debugger is accessing the source as one steps through code of methods that themselves are in the process of accessing source? And so it is key that > the debugger not* perturb the system when it itself accesses source? > > I'm confused. We seem to be talking past each other. I feel like you're blocking a reasonable proposal but I don't really understand what your objections are. I apologize. I'm not trying to be confrontational, but I do > think my proposal is important and has merit and I feel frustrated by you because I can't quite understand why you're against it. If you can identify a serious flaw I'll happily abandon it. But I need to understand the flaw > first.
I think that we have different ideas about what the problem is: 1) You say that the debugger is a key source of problems right now. I say that I'm not aware of issues with the debugger. I know what the potential problem could be, but I don't know whether it exists or not right now because I have not seen any issues with the debugger lately. If you have a reproducible case, please share it here.
No I'm not saying that the debugger is a source of problems. I'm saying that if one wants to debug source access then w.r.t. source access the debugger must not interfere. Since the debugger necessarily accesses source as it is used, it must not perturb source access while it is being used to debug source access.
Let me be clear. Let's say we want to step through
(Object >> #at:) timeStamp
as we stepped through the debugger would access several methods in CompiledMethod and FileStream. If we only had one read-only copy for the source file then the potential exists for the source pointer of the read-only copy to be changed after it has been set to point at the chunk for Object>>#at:, and hence get the wrong answer, maybe answering the timeStamp for CompiledMethod>>preamble or some such. So every time the debugger accesses source it must be careful not to disturb the single read-only copy of the source file. It can easily do this by using a private copy of the source file.
2) You say that the source files can safely be treated as only-to-be-used-by-the-IDE. I say that's not the case. IMO, I should be able to fork a process scanning the sources and do something else while it's processing the code. I also think that external tools like Seaside should be able to read the source files without messing up the image.
Have you ever done this is practice?
Probably yes, but I can't remember the exact situation. I remembered that SmallLint or one of its descendants do the linting in the background. I checked swalint, and found that it uses the UI process. So, it seems that only the Pharo version uses background processes.
Levente
Did I understand you correctly based on these two points above?
First, no. Second, yes. Is my clarification adequate? Levente
> > > > Levente > > > - CurrentReadOnlySourceFiles and cacheDuring: can be discarded > > > > > > > > > > > > Levente > > > > > > > >> Levente > > > > > > Eliot > > > > > > > > -- > > _,,,^..^,,,_ > > best, Eliot > > > > > > > > -- > _,,,^..^,,,_ > best, Eliot > >
-- _,,,^..^,,,_ best, Eliot
squeak-dev@lists.squeakfoundation.org