I have some large json files that get parsed; we're talking 15Mb or so. They *should* result in large arrays of arrays of dictionaries of... you get the picture.
Almost always, they do. Again and again, parse fileA and it works perfectly. Except just every now and then - maybe 3 times out of several hundred, on disparate dates, in different working images - the result is 58. Not 58 items in the root array, just 58, the SmallInteger.
I can't see a code path that can result in that, which certainly makes it more interesting than usual. To make it bit more intriguing the subsequent code raises a dNU when this happens and so I get a debugger to look at. I can see the 58 that has been returned and put into the temp. I can retry the "open file, read as json" and it works perfectly. Fortunately that means I can proceed from there but the mystery remains. Whilst typing this my Pi has happily read & parsed that file 100 times without error.
I *think* this has only actually happened on my Big! Scary! Xeon! server and never on the Pi, Mac, or little x64 xubuntu server. So maybe it's some strange bug courtesy of those nice people that brought us the Pentium Divide Error etc.
So this is a shout into the void in hope of hearing an echo back that tells me somebody else has seen this and trapped an actual error. Or whatever. Oh, should probably point out I'm using the JSON-ul.56 from http://www.squeaksource.com/PostgresV3in a 5.3 image
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim M$ are grinning pigs in a sea of buggy code - The Inquirer
On Wed, Apr 19, 2023 at 01:56:24PM -0700, tim Rowledge wrote:
I have some large json files that get parsed; we're talking 15Mb or so. They *should* result in large arrays of arrays of dictionaries of... you get the picture.
Almost always, they do. Again and again, parse fileA and it works perfectly. Except just every now and then - maybe 3 times out of several hundred, on disparate dates, in different working images - the result is 58. Not 58 items in the root array, just 58, the SmallInteger.
I can't see a code path that can result in that, which certainly makes it more interesting than usual. To make it bit more intriguing the subsequent code raises a dNU when this happens and so I get a debugger to look at. I can see the 58 that has been returned and put into the temp. I can retry the "open file, read as json" and it works perfectly. Fortunately that means I can proceed from there but the mystery remains. Whilst typing this my Pi has happily read & parsed that file 100 times without error.
I *think* this has only actually happened on my Big! Scary! Xeon! server and never on the Pi, Mac, or little x64 xubuntu server. So maybe it's some strange bug courtesy of those nice people that brought us the Pentium Divide Error etc.
So this is a shout into the void in hope of hearing an echo back that tells me somebody else has seen this and trapped an actual error. Or whatever. Oh, should probably point out I'm using the JSON-ul.56 from http://www.squeaksource.com/PostgresV3in a 5.3 image
I am not doing anything with json files in Squeak, so I have no experience to offer. However, just thinking about what might be different with the Xeon server, I would be inclined to look for something related to reading from the file system, as opposed to something directly in Squeak itself. If there is any sort of concurrent access to the files, especially if there is any possibility of a file being accessed or modified during the parsing process, or if files are written by one process but not fully flushed to the file system, you might well see some odd symptoms.
Dave
On 2023-04-19, at 4:04 PM, David T. Lewis lewis@mail.msen.com wrote:
On Wed, Apr 19, 2023 at 01:56:24PM -0700, tim Rowledge wrote:
I have some large json files that get parsed; we're talking 15Mb or so. They *should* result in large arrays of arrays of dictionaries of... you get the picture.
[snip]
I am not doing anything with json files in Squeak, so I have no experience to offer. However, just thinking about what might be different with the Xeon server, I would be inclined to look for something related to reading from the file system, as opposed to something directly in Squeak itself. If there is any sort of concurrent access to the files, especially if there is any possibility of a file being accessed or modified during the parsing process, or if files are written by one process but not fully flushed to the file system, you might well see some odd symptoms.
All good thoughts. I think it's not totally impossible the file gets reads simultaneously but it does seem extremely improbable. It would involve two systems working on the same pile of data simultaneously and since I'm the one running things I feel fairly sure I haven't done that.
I have been doing the parsing by reading the entire file contents and parsing that giant String; I've also tried parsing the filestream. I can't force it to happen at all. Grrr. The fact that it raises no errors is confusing too.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Useful random insult:- An 8086 in a StrongARM environment.
Hi Tim,
On Wed, Apr 19, 2023 at 1:57 PM tim Rowledge tim@rowledge.org wrote:
I have some large json files that get parsed; we're talking 15Mb or so. They *should* result in large arrays of arrays of dictionaries of... you get the picture.
Almost always, they do. Again and again, parse fileA and it works perfectly. Except just every now and then - maybe 3 times out of several hundred, on disparate dates, in different working images - the result is 58. Not 58 items in the root array, just 58, the SmallInteger.
I can't see a code path that can result in that, which certainly makes it more interesting than usual. To make it bit more intriguing the subsequent code raises a dNU when this happens and so I get a debugger to look at. I can see the 58 that has been returned and put into the temp. I can retry the "open file, read as json" and it works perfectly. Fortunately that means I can proceed from there but the mystery remains. Whilst typing this my Pi has happily read & parsed that file 100 times without error.
I *think* this has only actually happened on my Big! Scary! Xeon! server and never on the Pi, Mac, or little x64 xubuntu server. So maybe it's some strange bug courtesy of those nice people that brought us the Pentium Divide Error etc.
Looks like a VM bug with MNU. As always, fixing this depends on having a reproducible case.
So this is a shout into the void in hope of hearing an echo back that tells me somebody else has seen this and trapped an actual error. Or whatever. Oh, should probably point out I'm using the JSON-ul.56 from http://www.squeaksource.com/PostgresV3in a 5.3 image
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim M$ are grinning pigs in a sea of buggy code - The Inquirer
On 2023-04-20, at 10:03 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
Looks like a VM bug with MNU. As always, fixing this depends on having a reproducible case.
I suspect not a dNU per se, but there has to be some tiny chance of a weird edge case where with all the Seaside coroutining and several other processes, and large amounts of data, where the TOS gets on the wrong stack chunk if it is a day ending in $y and the weather is rainy in Seattle, and somebody has tried whistling dixie upside down in treacle. Or something.
That's the problem with having spent time in the VM codemines; you know so many places where Grues live...
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim If you never try anything new, you'll miss out on many of life's great disappointments
Small update to record a very slightly different event, this time running on my local x64 xubuntu server.
Processing one of the large json files again, the system halted because the first two chars of the String read from the file were incorrect. The file actually starts with {"scanRunOn but in the debugger it was showing syscanRunOn As before, redoing the file read produced the correct result.
It's almost as if the file reading is getting usurped somewhere.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Two wrongs are only the beginning.
Hi Tim,
just try and get the thing to crash/error from a script/command-line args with no user interaction. Once you have that LMK and I can take a look.
On Tue, Apr 25, 2023 at 10:04 AM tim Rowledge tim@rowledge.org wrote:
Small update to record a very slightly different event, this time running on my local x64 xubuntu server.
Processing one of the large json files again, the system halted because the first two chars of the String read from the file were incorrect. The file actually starts with {"scanRunOn but in the debugger it was showing syscanRunOn As before, redoing the file read produced the correct result.
It's almost as if the file reading is getting usurped somewhere.
tim
tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Two wrongs are only the beginning.
On 2023-04-25, at 10:12 AM, Eliot Miranda eliot.miranda@gmail.com wrote:
Hi Tim,
just try and get the thing to crash/error from a script/command-line args with no user interaction. Once you have that LMK and I can take a look.
I wish I could come up with something repeatable! Remember, this has hit maybe 5 times out of at least a couple of thousand runs; not great odds. I'll keep trying to come up with something but nobody should hold their breath; blue is not the best colour for a human face, barring some questionable makeup choices.
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim "My name is Inigo Montoya. You killed my parent process. Prepare to vi!"
On Tue, Apr 25, 2023 at 10:04:06AM -0700, tim Rowledge wrote:
Small update to record a very slightly different event, this time running on my local x64 xubuntu server.
Processing one of the large json files again, the system halted because the first two chars of the String read from the file were incorrect. The file actually starts with {"scanRunOn but in the debugger it was showing syscanRunOn As before, redoing the file read produced the correct result.
It's almost as if the file reading is getting usurped somewhere.
This smells like a concurrency issue related to file streams.
On a Unix platform, you might have two or more FileStream instances in the image that refer to the same underlying file. In between the image and the file system, you have stdio stream buffering in the VM that might be separately mapped to those FileStreams in the image. Underneath the stdio buffering in the VM, you have an actual file system with its own buffering. Somewhere underneath that stack of optimizations and buffers and abstractions, there is probably some sort of rotating media or solid state emulation of rotating media, probably with its own buffering.
That entire mess of buffers and abstractions is likely to behave differently on your big server versus your Raspberry Pi, which might account for the hiesenbugginess of the problem. But at the end of the day, the likely source of the problem will be something that is happening in the Squeak image. Which is potentially a good thing, since that happens to be the one thing that is actually under your direct control.
Bottom line, I don't think this is anything related to the VM or the server, it's more likely be something simply related to reading and writing the file streams associated with the JSON file data.
The 'sy' at the beginning of 'syscanRunOn' might be a clue. It must have come from somewhere, and it looks like the beginning of a string like "system" or some such.
Dave
On 2023-04-25, at 4:17 PM, David T. Lewis lewis@mail.msen.com wrote:
On Tue, Apr 25, 2023 at 10:04:06AM -0700, tim Rowledge wrote:
Small update to record a very slightly different event, this time running on my local x64 xubuntu server.
Processing one of the large json files again, the system halted because the first two chars of the String read from the file were incorrect. The file actually starts with {"scanRunOn but in the debugger it was showing syscanRunOn As before, redoing the file read produced the correct result.
It's almost as if the file reading is getting usurped somewhere.
This smells like a concurrency issue related to file streams.
[snip]
That's what I started thinking and then I realised that the files involved are static and never written to; at least, not by intent. I'm going to set the permissions to readonly and see if anything gets triggered!
Even the reading I'm doing is as near atomic as we get for file operations; open the file, read the entire contents, parse into json. Mind you, for a MultiByteFileStream we do seem to do a lot of work to handle potential non-ascii UTF-8chars. Maybe that opens the door to some problems? Except we're still looking at sometihng that has happened maybe 5 times out of a thousand swings through the same file(s)
tim -- tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim Any Sufficiently Advanced Incompetence Is Indistinguishable From Malice
On 4/28/23 02:20, tim Rowledge wrote:
Even the reading I'm doing is as near atomic as we get for file operations; open the file, read the entire contents, parse into json. Mind you, for a MultiByteFileStream we do seem to do a lot of work to handle potential non-ascii UTF-8chars. Maybe that opens the door to some problems? Except we're still looking at sometihng that has happened maybe 5 times out of a thousand swings through the same file(s)
... bad RAM??
Tony
This smells like a concurrency issue related to file streams.
[snip]
That's what I started thinking and then I realised that the files involved are static and never written to; at least, not by intent. I'm going to set the permissions to readonly and see if anything gets triggered!
You might double check you're opening them readonly in the image, as well; e.g., via #readOnlyFileNamed:do: instead of #fileNamed:do:.
Although -- I think a concurrency issue could exist without the files being written to. Are you forking this reading+parsing activity at all? If so, maybe two Processes decided to pick the same file..?
Please let us know if you find the issue!
squeak-dev@lists.squeakfoundation.org