Hi,
I have this pdf file (joined here) which always produce this error when I try to upload it in Seaside.
It seems to be true whenever the filename comes with non ascii characters.
All in all, it looks likes GRPharoUtf8Codec is wrong in decoding filename with non ascii characters.
It happens with Seaside3.1, Pharo4, newest Grease-Pharo30-Core package
When using the Seaside Functional Test Suite / Upload test, the same problem occurs.
What did I miss?
Thanks
Hilaire
On Tue, Jun 21, 2016 at 4:46 PM, Hilaire hilaire@drgeo.eu wrote:
Hi,
I have this pdf file (joined here) which always produce this error when I try to upload it in Seaside.
It seems to be true whenever the filename comes with non ascii characters.
All in all, it looks likes GRPharoUtf8Codec is wrong in decoding filename with non ascii characters.
It happens with Seaside3.1, Pharo4, newest Grease-Pharo30-Core package
When using the Seaside Functional Test Suite / Upload test, the same problem occurs.
What did I miss?
A stack trace would be helpful so that we know where error happens.
Also some other information: - is the web site utf-8? - what adapter do you use? - what's the encoding on the adapter? - what operating system and browser do you use? - I assume the strange language in the file name is french.
Super helpful would be a tcpdump of the http session but we can do without.
Cheers Philippe
It works with pure Zn in Pharo:
Will test later with Seaside
On 21 Jun 2016, at 19:01, Philippe Marschall philippe.marschall@gmail.com wrote:
On Tue, Jun 21, 2016 at 4:46 PM, Hilaire hilaire@drgeo.eu wrote:
Hi,
I have this pdf file (joined here) which always produce this error when I try to upload it in Seaside.
It seems to be true whenever the filename comes with non ascii characters.
All in all, it looks likes GRPharoUtf8Codec is wrong in decoding filename with non ascii characters.
It happens with Seaside3.1, Pharo4, newest Grease-Pharo30-Core package
When using the Seaside Functional Test Suite / Upload test, the same problem occurs.
What did I miss?
A stack trace would be helpful so that we know where error happens.
Also some other information:
- is the web site utf-8?
- what adapter do you use?
- what's the encoding on the adapter?
- what operating system and browser do you use?
- I assume the strange language in the file name is french.
Super helpful would be a tcpdump of the http session but we can do without.
Cheers Philippe _______________________________________________ seaside mailing list seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Hi,
Le 21/06/2016 19:01, Philippe Marschall a écrit :
A stack trace would be helpful so that we know where error happens.
Sadly there is no trace as the debugger does not popup. Only this error raised, from GRPharoUtf8Codec>>decode: with a call to the invalidUtf8 method:
GRPharoUtf8Codec>>invalidUtf8 GRInvalidUtf8Error signal: 'Invalid UTF-8 input'
Strangely no debugger pop up, only the error message printed on the browser, although in other seaside situation I have error. So I insert an halt to have the attached trace.
Also some other information:
- is the web site utf-8?
Yes
- what adapter do you use?
Type: ZnZincServerAdaptor Port: 8080 Encoding: utf-8 zinc on port 8080 [running]
- what's the encoding on the adapter?
- what operating system and browser do you use?
Linux for the server. I experienced the issue with Chrome under Windows and Linux As well Firefox on Linux
- I assume the strange language in the file name is french.
Indeed :)
Thanks
Super helpful would be a tcpdump of the http session but we can do without.
I have to learn to do that.
Hilaire,
As the reason of why not having a debugger come up... did you set the development error handler in Seaside configuration?
app filter configuration at: #'exceptionHandler' put: WAWalkbackErrorHandler
You may have another error handler defined,...
Cheers,
On Tue, Jun 21, 2016 at 2:31 PM, Hilaire hilaire@drgeo.eu wrote:
Hi,
Le 21/06/2016 19:01, Philippe Marschall a écrit :
A stack trace would be helpful so that we know where error happens.
Sadly there is no trace as the debugger does not popup. Only this error raised, from GRPharoUtf8Codec>>decode: with a call to the invalidUtf8 method:
GRPharoUtf8Codec>>invalidUtf8 GRInvalidUtf8Error signal: 'Invalid UTF-8 input'
Strangely no debugger pop up, only the error message printed on the browser, although in other seaside situation I have error. So I insert an halt to have the attached trace.
Also some other information:
- is the web site utf-8?
Yes
- what adapter do you use?
Type: ZnZincServerAdaptor Port: 8080 Encoding: utf-8 zinc on port 8080 [running]
- what's the encoding on the adapter?
- what operating system and browser do you use?
Linux for the server. I experienced the issue with Chrome under Windows and Linux As well Firefox on Linux
- I assume the strange language in the file name is french.
Indeed :)
Thanks
Super helpful would be a tcpdump of the http session but we can do
without.
I have to learn to do that.
-- Dr. Geo http://drgeo.eu
seaside mailing list seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
On Tue, Jun 21, 2016 at 7:31 PM, Hilaire hilaire@drgeo.eu wrote:
Hi,
Le 21/06/2016 19:01, Philippe Marschall a écrit :
A stack trace would be helpful so that we know where error happens.
Sadly there is no trace as the debugger does not popup. Only this error raised, from GRPharoUtf8Codec>>decode: with a call to the invalidUtf8 method:
GRPharoUtf8Codec>>invalidUtf8 GRInvalidUtf8Error signal: 'Invalid UTF-8 input'
Ok, it's likely in the server adapter before Seaside actually kicks in then. Can you set a break point in GRPharoUtf8Codec >> #invalidUtf8?
My suspect would be ZnZincServerAdaptor >> #convertMultipart:
If you can send us the string it's trying to convert that would be helpful.
Cheers Philippe
Le 22/06/2016 09:40, Philippe Marschall a écrit :
Ok, it's likely in the server adapter before Seaside actually kicks in then. Can you set a break point in GRPharoUtf8Codec >> #invalidUtf8?
My suspect would be ZnZincServerAdaptor >> #convertMultipart:
If you can send us the string it's trying to convert that would be helpful.
The string argument of GRPharoUtf8Codec>>decode:
is
aString ->'Identités certifiées.pdf'
printed as this in the Debugger. As we know Pharo does not use UTF8 internally it is suspect to see an utf8 string correctly printed in Pharo, right?
Does it looks like a Latin1 ?:
aString asByteArray do: [:each| Transcript show: each hex ; space]
16r49 16r64 16r65 16r6E 16r74 16r69 16r74 16rE9 16r73 16r20 16r63 16r65 16r72 16r74 16r69 16r66 16r69 16rE9 16r65 16r73 16r2E 16r70 16r64 16r66
So indeed, GRPharoUtf8Codec>>decode: already received a decoded utf8 string to latin1, then obviously fail.
Now looking back in the stack as you suggested, then decoding already took place at:
ZnMimePart>>fileName "Pathnames are often silenty encoded using UTF-8, this is a no-op for ASCII, but will fail on Latin-1 and others" ^ (self detectContentDispositionValue: 'filename') ifNotNil: [ :encodedFileName | encodedFileName asByteArray utf8Decoded ]
The timecode of this method is 10/10/2014 from Sven
The second place where the decode takes place is (Zinc-Seaside package):
ZNZincServerAdaptor>>convertMultipartFileField: part | file | (file := WAFile new) fileName: (self codec decode: part fileName); contentType: part contentType printString; contents: part contents asByteArray. ^ file
Timecode is 11/14/2014 from Johan, where the decode was added.
This two methods use too different decoding methods (duplication?), one from the Grease package, the other from ZN package.
My opinion is the Zinc-Seaside package should not try to decode, or preferably use the ZN decode method (utf8Decoded), but it will bring an error on already decoded string.
Hilaire
My suggestion is to remove the decode: part, this change looks safe as part fileName already decodes.
Hilaire
Le 22/06/2016 13:33, Hilaire a écrit :
ZNZincServerAdaptor>>convertMultipartFileField: part | file | (file := WAFile new) fileName: (self codec decode: part fileName); contentType: part contentType printString; contents: part contents asByteArray. ^ file
Timecode is 11/14/2014 from Johan, where the decode was added.
My suggestion is to remove the decode: in this method. I am not sure about the whole implication.
Hilaire
Le 22/06/2016 13:33, Hilaire a écrit :
ZNZincServerAdaptor>>convertMultipartFileField: part | file | (file := WAFile new) fileName: (self codec decode: part fileName); contentType: part contentType printString; contents: part contents asByteArray. ^ file
Timecode is 11/14/2014 from Johan, where the decode was added.
On 22 Jun 2016, at 14:01, Hilaire hilaire@drgeo.eu wrote:
My suggestion is to remove the decode: in this method.
I think that too.
I am not sure about the whole implication.
Me neither ;-)
In any case, here is an older discussion about the same topic:
http://forum.world.st/File-upload-encoding-issue-tt4783446.html
Sven
Hilaire
Le 22/06/2016 13:33, Hilaire a écrit :
ZNZincServerAdaptor>>convertMultipartFileField: part | file | (file := WAFile new) fileName: (self codec decode: part fileName); contentType: part contentType printString; contents: part contents asByteArray. ^ file
Timecode is 11/14/2014 from Johan, where the decode was added.
-- Dr. Geo http://drgeo.eu
seaside mailing list seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Indeed, but looking at the time code of the two methods I mentioned (both after this October 2014 discussion), it looks like you guys concurrently fixed the problem in two different parts, but resulting in a mural break :) Hopefully easy to get fixed.
But I am surprise this problem did not show up till then. Or is it fixed in newer ZN-* packages?
Hilaire
Le 22/06/2016 15:28, Sven Van Caekenberghe a écrit :
Me neither ;-)
In any case, here is an older discussion about the same topic:
http://forum.world.st/File-upload-encoding-issue-tt4783446.html
Sven
I don't remember every detail of the discussions, but by the look of it, your solution seems correct: Zn did already decode it, the Zn adaptor should not do it again.
On 22 Jun 2016, at 21:59, Hilaire hilaire@drgeo.eu wrote:
Indeed, but looking at the time code of the two methods I mentioned (both after this October 2014 discussion), it looks like you guys concurrently fixed the problem in two different parts, but resulting in a mural break :) Hopefully easy to get fixed.
But I am surprise this problem did not show up till then. Or is it fixed in newer ZN-* packages?
Hilaire
Le 22/06/2016 15:28, Sven Van Caekenberghe a écrit :
Me neither ;-)
In any case, here is an older discussion about the same topic:
http://forum.world.st/File-upload-encoding-issue-tt4783446.html
Sven
-- Dr. Geo http://drgeo.eu
seaside mailing list seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
If one grants me write access, I can commit the change. Of course I would like to know the opinion of Johan, I don't want to break anything.
Hilaire
Le 22/06/2016 22:01, Sven Van Caekenberghe a écrit :
I don't remember every detail of the discussions, but by the look of it, your solution seems correct: Zn did already decode it, the Zn adaptor should not do it again.
Hi Hilaire,
I am curious which package version of Zinc-Seaside you are using.
I vaguely remember something about fixing an issue in that area but when I look at Zinc-Seaside-JohanBrichau.43 (which is the latest version) the send of #decode: is not there.
cheers Johan
On 24 Jun 2016, at 17:21, Hilaire hilaire@drgeo.eu wrote:
If one grants me write access, I can commit the change. Of course I would like to know the opinion of Johan, I don't want to break anything.
Hilaire
Le 22/06/2016 22:01, Sven Van Caekenberghe a écrit :
I don't remember every detail of the discussions, but by the look of it, your solution seems correct: Zn did already decode it, the Zn adaptor should not do it again.
-- Dr. Geo http://drgeo.eu
seaside mailing list seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Hi Johan,
When I browse directly from Monticello the Zinc-Seaside-JohanBrichau.43 package I can read the #decode: message.
Is it possible you fixed it locally without a commit?
The repo I have been using is http://mc.stfx.eu/ZincHTTPComponents
Hilaire
Le 25/06/2016 08:37, Johan Brichau a écrit :
Hi Hilaire,
I am curious which package version of Zinc-Seaside you are using.
I vaguely remember something about fixing an issue in that area but when I look at Zinc-Seaside-JohanBrichau.43 (which is the latest version) the send of #decode: is not there.
cheers Johan
Are there multiple package versions out there with the same version number? Can you check if the UUID is the same as the one below?
Name: Zinc-Seaside-JohanBrichau.43 Author: JohanBrichau Time: 26 December 2014, 4:00:06.580211 pm UUID: 62fd0b62-9e07-498e-aac8-3432614b39fc Ancestors: Zinc-Seaside-pmm.42
Here is a screenshot when I look at Zinc-Seaside-JohanBrichau.43 in http://mc.stfx.eu/ZincHTTPComponents I am looking at the correct method, right?
On 25 Jun 2016, at 08:58, Hilaire hilaire@drgeo.eu wrote:
Hi Johan,
When I browse directly from Monticello the Zinc-Seaside-JohanBrichau.43 package I can read the #decode: message.
Is it possible you fixed it locally without a commit?
The repo I have been using is http://mc.stfx.eu/ZincHTTPComponents
Hilaire
Le 25/06/2016 08:37, Johan Brichau a écrit :
Hi Hilaire,
I am curious which package version of Zinc-Seaside you are using.
I vaguely remember something about fixing an issue in that area but when I look at Zinc-Seaside-JohanBrichau.43 (which is the latest version) the send of #decode: is not there.
cheers Johan
-- Dr. Geo http://drgeo.eu
seaside mailing list seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
Hilaire,
I threw away my package cache and notice that I had a different version of the same package…
Need to figure out what happened. The commit comment refers to https://code.google.com/p/seaside/issues/detail?id=836
Looking into it and trying to refresh my mind :/
Johan
On 25 Jun 2016, at 09:16, Johan Brichau johan@inceptive.be wrote:
Are there multiple package versions out there with the same version number? Can you check if the UUID is the same as the one below?
Name: Zinc-Seaside-JohanBrichau.43 Author: JohanBrichau Time: 26 December 2014, 4:00:06.580211 pm UUID: 62fd0b62-9e07-498e-aac8-3432614b39fc Ancestors: Zinc-Seaside-pmm.42
Here is a screenshot when I look at Zinc-Seaside-JohanBrichau.43 in http://mc.stfx.eu/ZincHTTPComponents http://mc.stfx.eu/ZincHTTPComponents I am looking at the correct method, right?
<Screen Shot 2016-06-25 at 09.15.01.png>
On 25 Jun 2016, at 08:58, Hilaire <hilaire@drgeo.eu mailto:hilaire@drgeo.eu> wrote:
Hi Johan,
When I browse directly from Monticello the Zinc-Seaside-JohanBrichau.43 package I can read the #decode: message.
Is it possible you fixed it locally without a commit?
The repo I have been using is http://mc.stfx.eu/ZincHTTPComponents http://mc.stfx.eu/ZincHTTPComponents
Hilaire
Le 25/06/2016 08:37, Johan Brichau a écrit :
Hi Hilaire,
I am curious which package version of Zinc-Seaside you are using.
I vaguely remember something about fixing an issue in that area but when I look at Zinc-Seaside-JohanBrichau.43 (which is the latest version) the send of #decode: is not there.
cheers Johan
-- Dr. Geo http://drgeo.eu http://drgeo.eu/
seaside mailing list seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
When downloading
http://mc.stfx.eu/ZincHTTPComponents/Zinc-Seaside-JohanBrichau.43.mcz
and viewing the source code in a text editor, the #decode: message is present in the convertMultipartFileField: method.
Hilaire
Le 25/06/2016 09:24, Johan Brichau a écrit :
Hilaire,
I threw away my package cache and notice that I had a different version of the same package…
Need to figure out what happened. The commit comment refers to https://code.google.com/p/seaside/issues/detail?id=836
Looking into it and trying to refresh my mind :/
Johan
Hilaire,
Your observation that Sven and I both fixed it in different places is correct. I did revert the fix on December 26th but I somehow created the package below with the same version number and it never got uploaded correctly (for reasons I cannot really explain).
Name: Zinc-Seaside-JohanBrichau.43 Author: JohanBrichau Time: 26 December 2014, 4:00:06.580211 pm UUID: 62fd0b62-9e07-498e-aac8-3432614b39fc Ancestors: Zinc-Seaside-pmm.42
I already created a new package .44 and will email it to Sven but...
The “funny” part right now is that the upload file test does not break in my image when using the published code for Zinc-Seaside! So… I am a bit baffled and would like to understand first what is going in.
Johan
On 25 Jun 2016, at 09:35, Hilaire hilaire@drgeo.eu wrote:
When downloading
http://mc.stfx.eu/ZincHTTPComponents/Zinc-Seaside-JohanBrichau.43.mcz
and viewing the source code in a text editor, the #decode: message is present in the convertMultipartFileField: method.
Hilaire
Le 25/06/2016 09:24, Johan Brichau a écrit :
Hilaire,
I threw away my package cache and notice that I had a different version of the same package…
Need to figure out what happened. The commit comment refers to https://code.google.com/p/seaside/issues/detail?id=836
Looking into it and trying to refresh my mind :/
Johan
-- Dr. Geo http://drgeo.eu
seaside mailing list seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
On 25 Jun 2016, at 10:00, Johan Brichau johan@inceptive.be wrote:
The “funny” part right now is that the upload file test does not break in my image when using the published code for Zinc-Seaside! So… I am a bit baffled and would like to understand first what is going in.
I got it... it crashes in Safari, not with Chrome. Obviously, the ‘old fix’ was wrong and I undid it in December 2016 but the package was never correctly committed (only in my package cache… which I also never cleared apparently in my Seaside work folder…)
I sent my package .44 to Sven for publication. Just remove the decode: message send in the adaptor, as you mentioned.
thanks for reporting and sorry about the mess
Johan
Very strange and annoying, I copied Johan's .44 version into the official Zn repositories.
On 25 Jun 2016, at 10:21, Johan Brichau johan@inceptive.be wrote:
On 25 Jun 2016, at 10:00, Johan Brichau johan@inceptive.be wrote:
The “funny” part right now is that the upload file test does not break in my image when using the published code for Zinc-Seaside! So… I am a bit baffled and would like to understand first what is going in.
I got it... it crashes in Safari, not with Chrome. Obviously, the ‘old fix’ was wrong and I undid it in December 2016 but the package was never correctly committed (only in my package cache… which I also never cleared apparently in my Seaside work folder…)
I sent my package .44 to Sven for publication. Just remove the decode: message send in the adaptor, as you mentioned.
thanks for reporting and sorry about the mess
Johan _______________________________________________ seaside mailing list seaside@lists.squeakfoundation.org http://lists.squeakfoundation.org/cgi-bin/mailman/listinfo/seaside
It is great to see free software in action.
Thanks
Hilaire
Le 25/06/2016 10:59, Sven Van Caekenberghe a écrit :
Very strange and annoying, I copied Johan's .44 version into the official Zn repositories.
Le 25/06/2016 09:16, Johan Brichau a écrit :
Are there multiple package versions out there with the same version number? Can you check if the UUID is the same as the one below?
The description of the package I have looks strange (see the Time):
Name: Zinc-Seaside-JohanBrichau.43 Author: JohanBrichau Time: 22 Juin 2016, 8:25:10.061265 pm UUID: d2edc44e-eebb-454f-b309-f6d49d639770 Ancestors: Zinc-Seaside-pmm.42
When I flush the Monticello cache and refresh, it looks like (the Time is shifted to now):
Name: Zinc-Seaside-JohanBrichau.43 Author: JohanBrichau Time: 25 Juin 2016, 8:25:10.061265 pm UUID: d2edc44e-eebb-454f-b309-f6d49d639770 Ancestors: Zinc-Seaside-pmm.42
Here is a screenshot when I look at Zinc-Seaside-JohanBrichau.43 in http://mc.stfx.eu/ZincHTTPComponents I am looking at the correct method, right?
Yes.
On Wed, Jun 22, 2016 at 1:33 PM, Hilaire hilaire@drgeo.eu wrote:
Le 22/06/2016 09:40, Philippe Marschall a écrit :
Ok, it's likely in the server adapter before Seaside actually kicks in then. Can you set a break point in GRPharoUtf8Codec >> #invalidUtf8?
My suspect would be ZnZincServerAdaptor >> #convertMultipart:
If you can send us the string it's trying to convert that would be helpful.
The string argument of GRPharoUtf8Codec>>decode:
is
aString ->'Identités certifiées.pdf'
printed as this in the Debugger. As we know Pharo does not use UTF8 internally it is suspect to see an utf8 string correctly printed in Pharo, right?
You are seeing a UTF-8 string that has already been decoded to Pharo/Unicode therefore it displays correctly. Then Seaside/the adaptor tries to decode it a second time which fails.
Does it looks like a Latin1 ?:
aString asByteArray do: [:each| Transcript show: each hex ; space]
16r49 16r64 16r65 16r6E 16r74 16r69 16r74 16rE9 16r73 16r20 16r63 16r65 16r72 16r74 16r69 16r66 16r69 16rE9 16r65 16r73 16r2E 16r70 16r64 16r66
So indeed, GRPharoUtf8Codec>>decode: already received a decoded utf8 string to latin1, then obviously fail.
Correct.
Now looking back in the stack as you suggested, then decoding already took place at:
ZnMimePart>>fileName "Pathnames are often silenty encoded using UTF-8, this is a no-op for ASCII, but will fail on Latin-1 and others" ^ (self detectContentDispositionValue: 'filename') ifNotNil: [ :encodedFileName | encodedFileName asByteArray utf8Decoded ]
The timecode of this method is 10/10/2014 from Sven
The second place where the decode takes place is (Zinc-Seaside package):
ZNZincServerAdaptor>>convertMultipartFileField: part | file | (file := WAFile new) fileName: (self codec decode: part fileName); contentType: part contentType printString; contents: part contents asByteArray. ^ file
Timecode is 11/14/2014 from Johan, where the decode was added.
This two methods use too different decoding methods (duplication?), one from the Grease package, the other from ZN package.
My opinion is the Zinc-Seaside package should not try to decode, or preferably use the ZN decode method (utf8Decoded), but it will bring an error on already decoded string.
The output of Zinc-Seaside must be decoded UTF-8 in Pharo encoding. How that is achieved is up to the Zinc-Seaside package. Just to be sure, you are working with an up to date Zinc version?
Cheers Philippe
I think we sorted out the problem. See the other discussion in the thread.
Thanks
Hilaire
Le 26/06/2016 18:38, Philippe Marschall a écrit :
The output of Zinc-Seaside must be decoded UTF-8 in Pharo encoding. How that is achieved is up to the Zinc-Seaside package. Just to be sure, you are working with an up to date Zinc version?
seaside@lists.squeakfoundation.org