Item12705: Verbatim blocks are corrupted on save by Wysiwyg editor if $Foswiki::cfg{Site}{Locale} = 'utf-8'; (hotfix available)
Priority: Urgent
Current State: Closed
Released In: 2.0.0
Target Release: major
I just upgraded foswiki from 1.1.6 to 1.1.9. Then I noticed that whenever I update a page with "verbatim" quotes in it, it gets destroyed upon saving the page. For example:
#
# /etc/fstab
# Created by anaconda on Wed Jul 3 16:39:30 2013
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
After saving that document, it becomes:
#
#Â /etc/fstab
# Created by anaconda on Wed Jul  3 16:39:30 2013
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
The workaround so far has been to use the "pre" tag instead of "verbatim." However this is quite annoying and I'd like to see if this can be resolved.
--
SatoshiYagi - 23 Dec 2013
Is your system running with locale's enabled?
In your
LocalSite.cfg file, what are the following settings?
$Foswiki::cfg{UseLocale} =
$Foswiki::cfg{Site}{Locale} =
$Foswiki::cfg{Site}{CharSet} =
--
GeorgeClark - 23 Dec 2013
Thanks for your quick response. Here is the information. I guess i wasn't running with locale's enabled.
$Foswiki::cfg{UseLocale} = 0;
$Foswiki::cfg{Site}{Locale} = 'en_US.ISO-8859-1';
$Foswiki::cfg{Site}{CharSet} = 'utf-8';
--
SatoshiYagi - 24 Dec 2013
Okay, that's actually good. Locale's are a bit experimental. I'd be more apt to expect issues with it enabled?
Are you able to see the issue on either foswiki.org, or trunk.foswiki.org using the Sandbox web. I'm not able to recreate the issue here and I use verbatim blocks frequently. So something is a bit different on your system.
Also, does the corruption occur after the initial save... ie
- Edit new document
- Paste in data
- Save, and document is corrupted
Or can if you edit an existing page, For example the
InstallationGuide contains verbatim blocks. If you edit and save that page does it become corrupted.
v
--
GeorgeClark - 24 Dec 2013
I can't seem to reproduce the issue on foswiki.org's sandbox.
This issue doesn't happen when I create a new document with verbatim blocks in it.
It happens when I modify an existing page with verbatim blocks in it via the WYSIWYG editor and save the document. The raw editor doesn't corrupt the document, either.
What's funny is that the verbatim block corruption gets worse as I save over and over. Here is how the original sample block looks like after 4 saves:
#
# /etc/fstab
# Created by anaconda on Wed Jul  3 16:39:30 2013
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
--
SatoshiYagi - 24 Dec 2013
Looks like I was on 1.1.8 before, not 1.1.6. Let me try installing 1.1.8 on the same system and see if the problem persists there.
--
SatoshiYagi - 24 Dec 2013
Hmm, 1.1.8 is working just fine. Did anything change wrt the handling of verbatim blocks between 1.1.8 and 1.1.9?
--
SatoshiYagi - 24 Dec 2013
The
TinyMCE editor uses some special characters to flag things like non-breaking spaces, etc. But I don't see 130 or 131 being used. The "verbatim" block is a foswiki pseudo-html block. The
WysiwygPlugin converts it to a
<pre class="TMLVerbatim">
block.
If you take a simple page with a verbatim block, click edit to start
TinyMCE, and then click the "HTML" button in the tinymce toolbar, you will see how Foswiki converted the TML into HTML for
TInyMCE. It might be helpful to take an uncorrupted topic, edit it and view the html being edited using the HTML button. To see if the corruption happens on the way into the editor.
If you then cancel the HTML dialog and click the
WikiText button to it's right, it will perform the conversion back to TML.
--
GeorgeClark - 24 Dec 2013
We are crossing paths. Hm... There were not a lot of
TInyMCE changes between 1.1.8 and 1.1.9. Actually very few, and nothing related to verbatim blocks.
There was one change in how character sets are encoded. That's a bit suspicious.
distro:6cc59069487c
--
GeorgeClark - 24 Dec 2013
Can you check the source of the original page before it's corrupted. Are there any unprintables in the verbatim block that might be triggering this? If you can find a topic that will trigger the corruption, and it doesn't contain any private data, could you attach it to the task so we could try to edit the triggering topic locally?
Unfortunately the timing of this is rather poor. Most of the developers are away for the Christmas holidays.
--
GeorgeClark - 24 Dec 2013
Thanks, I can try to observe the HTML on
TinyMCE.
I just installed a clean-slate 1.1.9 and the issue isn't happening there either. This means that the corruption only happens on the existing pages I've migrated over from 1.1.8.
I'll re-create the problematic environment once again and observe HTML per your suggestion.
--
SatoshiYagi - 24 Dec 2013
OK, as far as I can tell
TinyMCE is displaying the HTML just fine. So the issue occurs upon saving, not during editing.
--
SatoshiYagi - 24 Dec 2013
Actually, I take back this point: "This means that the corruption only happens on the existing pages I've migrated over from 1.1.8."
This issue is happening on new pages on my installation which got documents migrated over. So perhaps something broke my installation while copying data/Main and pub/Main?
I suppose the thing to do at this point is to redo the migration and see what happens. I'll get back with the result.
--
SatoshiYagi - 24 Dec 2013
Found the problem. It was not my migrated pages. It was the setting below.
#$Foswiki::cfg{Site}{CharSet} = 'iso-8859-1';
$Foswiki::cfg{Site}{CharSet} = 'utf-8';
As soon as I turned on utf-8 on
CharSet, the problem starts to happen. Phew!
--
SatoshiYagi - 24 Dec 2013
If you could reproduce that that would be great...it is consistently reproducible on my end.
I use utf-8 for Japanese support, so it'd be great if it can be used.
--
SatoshiYagi - 24 Dec 2013
Ah... Use of utf-8 is a bit of a challenge. I believe there is a process to "convert" pages to utf-8. It's more complex than just enabling it. I have not worked with it. I'll set this to waiting on
CrawfordCurrie,
MichaelDaum and
JanKrueger. I think they have worked with utf-8.
--
GeorgeClark - 24 Dec 2013
Sounds good. This is a new issue in 1.1.9, as I had used utf-8 on 1.1.8 without issues (that's why I modified the setting when I migrated to 1.1.9).
Thanks for your help! Happy holidays.
--
SatoshiYagi - 24 Dec 2013
I can confirm the error on trunk. Note that the error already manifests switching from wysiwyg to wiki-text. There's a call to the
html2tml
rest handler which produces the illegal encoding.
--
MichaelDaum - 24 Dec 2013
I've added this to
KnownIssuesOfFoswiki01x01. We should probably get a fix into an updated
WysiwygPlugin ASAP. Most of our
utf-8
work is deferred for 1.2, but this is new breakage, so we need to get it fixed. Unfortunately I don't seem to be able to recreate it. I switched my local test system to
utf-8
with locale's enabled, but verbatim blocks seem to survive just fine. If I could recreate it I'd try reverting
distro:6cc59069487c to see if that is related. It hits the right places to be suspect.
--
GeorgeClark - 24 Dec 2013
You might want to copy the original verbatim block (fstab) I posted on this page. Not all verbatim blocks go berserk, but this one certainly does. Also this issue might be platform-dependent? I am running Fedora 20 with apache 2.4.6-6 on Perl 5.18 CGI.
--
SatoshiYagi - 24 Dec 2013
Yes,
distro:6cc59069487c introduced the error. Instead of reverting, here's a hotfix that cures the problem as far as I can see. Tested on utf-8 systems only.
--- lib/Foswiki/Plugins/WysiwygPlugin/Constants.pm (revision 17183)
+++ lib/Foswiki/Plugins/WysiwygPlugin/Constants.pm (working copy)
@@ -328,7 +328,6 @@
}
}
HTML::Entities::_decode_entities( $_[0], $representable_entities );
- $_[0] = Encode::encode( encoding(), $_[0] );
return $_[0];
}
I've not tested it on non-utf-8 systems but it looks as if the extra call to Encode::encode can be spared.
--
MichaelDaum - 25 Dec 2013
Thanks. I applied the one-line change on my installation and it seems to be working as well.
--
SatoshiYagi - 26 Dec 2013
I thought that that change looked a bit like it might mask a larger problem, and
GeorgeClark found it suspicious, too, so I took the opportunity to learn a little bit more about how Perl's Unicode handling works.
In short, that change will work properly when
$_[0]
contains a Perl byte string, but not when it contains a Perl Unicode string. In the latter case, the string will eventually be output by treating the individual Unicode code points as bytes (roughly equivalent to outputting as ISO-8859-1, plus a "Wide character in print" warning for all code points >
0xff
). That's almost never the right thing to do.
Given the way we currently deal with encodings, it's
fairly safe to assume that
$_[0]
will never be a Unicode string (and we'll have to do fairly extensive rewriting to port everything to Unicode, anyway, so there's no need to consider future-proofness). That said, if we want to be
really careful, we can turn the line in question into this instead:
$_[0] = Encode::encode( encoding(), $_[0] ) if utf8::is_utf8($_[0]);
--
JanKrueger - 03 Jan 2014
OK, I can see what's going on. It's a special case handling for verbatim blocks that I missed when I removed the others. Patch:
===================================================================
--- lib/Foswiki/Plugins/WysiwygPlugin/HTML2TML/Node.pm (revision 17082)
+++ lib/Foswiki/Plugins/WysiwygPlugin/HTML2TML/Node.pm (working copy)
@@ -1350,13 +1350,13 @@
sub _verbatim {
my ( $this, $tag, $options ) = @_;
- $options |= $WC::PROTECTED | $WC::KEEP_ENTITIES | $WC::BR2NL | $WC::KEEP_WS;
+ $options |= $WC::PROTECTED;
+ # We must switch off KEEP_ENTITIES to allow _flatten to
+ # decodeRepresentableEntities
+ $options &= ~$WC::KEEP_ENTITIES;
+
my ( $flags, $text ) = $this->_flatten($options);
- # decode once, and once only. This will decode only those
- # entities than have a representation in the site charset.
- WC::decodeRepresentableEntities($text);
-
# decodes to \240, which we want to make a space.
$text =~ s/\240/$WC::NBSP/g;
my $p = _htmlParams( $this->{attrs}, $options );
There are failing unit tests with this patch, that need to be analysed before it can be committed.
--
CrawfordCurrie - 07 Jan 2014
I'm totally lost within Node.pm. Debug prints of $text are always empty, but clearly changing the flags breaks the tests. This _verbatim code is called with $tag set to either 'verbatim' or 'sticky'. The only difference that I'm aware of is during Foswiki render in the normal page view. Verbatim blocks are not touched by render, where as sticky blocks are rendered normally. However during edit, they are both treated similarly.
I wonder if sticky blocks might encounter similar corruption, and it just has not been reported. (yes indeed.
Update: It appears that sticky tests encounter the same corruption.) The good news though is that the entire suite finished with Site CharSet set to utf8, and it was Translator,
ExtendedTranslator and the new Logger tests I just added that had difficulties.
In any event, by making your change conditional to $tag eq 'verbatim', then that fixes the sticky issues. But   is still broken in verbatim blocks.
If a user enters  , that should not be touched. I think that's true in either verbatim or sticky. User enters it, it sticks and survives roundtrips.
Is the issue that decodeRepresentableEntities should leave it alone?
There are other nbsp issues -
Item5103 for ex.
Totally lost now...
--
GeorgeClark - 07 Jan 2014
Anybody able to release a patch contrib?
--
MichaelDaum - 31 Jan 2014
Hi Michael,
I could easily build a patch contrib, or we could just build an updated
WysiwygPlugin. But your patch and Crawford's as well both have issues.
We really should be able to get a clean run of the
RoundTripTests after the patch. This also shows a flaw in our test cases, in that the tests pass if you don't use a utf8 site character-set.
There is some further discussion of the issue
here on IRC
--
GeorgeClark - 31 Jan 2014
A list of things to try out that break:
- Fixed: verbatim blocks are corrupted
- %ORANGEÊn't repro: sticky blocks are corrupted (is this the issue being reported somewhere in the irc logs? ... not sure)
- Broken :
are removed
Please add to the above list so that everybody can see where we are.
--
MichaelDaum - 03 Feb 2014
With
{Site}{CharSet} = 'utf8'
, the following unit tests fail:
Module Failure summary
TranslatorTests has 20 unexpected results (of 398):
After Crawford's patch
Module Failure summary
TranslatorTests has 4 unexpected results (of 398):
--
GeorgeClark - 04 Feb 2014
Here is the sticky issue:
TranslatorTests::testROUNDTRIP_entityWithNoNameInsideSticky
==entityWithNoNameInsideSticky== Expected TML:
<sticky>♀</sticky>
==entityWithNoNameInsideSticky== Actual TML:
<sticky>â</sticky>
==entityWithNoNameInsideSticky==
<sticky><<==== HERE actual 195 != expected 38
--
GeorgeClark - 04 Feb 2014
With Crawford's patch and in real life I still can't reproduce any problem with ♀ as indicated by
TranslatorTests::testROUNDTRIP_entityWithNoNameInsideSticky
. What I did:
- open the wysiwyg editor
- add
<sticky>♀</sticky>
- save
- open again
- edit some other text
- save
- open again
- switch to wiki text
- edit some other text
- save
Tested with SKIN = pattern, SKIN = natedit, pattern, SKIN = nat.
The sticky + html entity is preserved as expected. Not so a nacked These are silently removed.
--
MichaelDaum - 04 Feb 2014
It's hard to say since the
TinyMCE editor might also be changing the character back into an entity. The non-working Selenium tests would be closer to what you ar doing.
You might also try to just click the
WikiText button, which also drives the
HTML2TML conversion, rather than going through the save cycle.
I still think that the unit tests need to pass. It pretty clearly shows that an entity inside of a stickly block does not survive a round trip, independent of the path through the browser and javascript. It's painful to debug, but to find out what is happening in the editor, I've had to add debug print's to dump the posted data from tinymce before the conversion, then put that into a unit test as the input html, and test it as a
HTML2TML test instead of a ROUNDTRIP. This comes closer to capturing what the editor did to the page.
We really need something like selenium to pass the test cases through a save cycle of the JS editor.
--
GeorgeClark - 04 Feb 2014
Crawford, It looks like we're really going to need some help with your proposed patch and loss of whitespace. I've made two attempts to figure it out and failed both times. This is blocking 1.2.
--
GeorgeClark - 04 Aug 2014
Done.
--
CrawfordCurrie - 07 Nov 2014
Reopening. Wysiwyg crashes in multiple places on my system
--
GeorgeClark - 21 Nov 2014
Since I don't have access to your system, perhaps some clues as to
where it crashes.... ?
--
CrawfordCurrie - 24 Nov 2014
It's fixed. Fixes from both Crawford and myself.
$WC::TML_COLOURS()
was not callable as a subroutine, and missing subroutine triggered for sites not running utf-8.
--
GeorgeClark - 24 Nov 2014