You are here: Foswiki>Tasks Web>Item12705 (05 Jul 2015, GeorgeClark)Edit Attach

Item12705: Verbatim blocks are corrupted on save by Wysiwyg editor if $Foswiki::cfg{Site}{Locale} = 'utf-8'; (hotfix available)

pencil
Priority: Urgent
Current State: Closed
Released In: 2.0.0
Target Release: major
Applies To: Engine
Component: I18N, WysiwygPlugin
Branches: master
Reported By: SatoshiYagi
Waiting For:
Last Change By: GeorgeClark
I just upgraded foswiki from 1.1.6 to 1.1.9. Then I noticed that whenever I update a page with "verbatim" quotes in it, it gets destroyed upon saving the page. For example:

#
# /etc/fstab
# Created by anaconda on Wed Jul  3 16:39:30 2013
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#

After saving that document, it becomes:

#
#Â /etc/fstab
# Created by anaconda on Wed Jul  3 16:39:30 2013
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#

The workaround so far has been to use the "pre" tag instead of "verbatim." However this is quite annoying and I'd like to see if this can be resolved.

-- SatoshiYagi - 23 Dec 2013

Is your system running with locale's enabled?

In your LocalSite.cfg file, what are the following settings?
$Foswiki::cfg{UseLocale} = 
$Foswiki::cfg{Site}{Locale} = 
$Foswiki::cfg{Site}{CharSet} = 

-- GeorgeClark - 23 Dec 2013

Thanks for your quick response. Here is the information. I guess i wasn't running with locale's enabled.

$Foswiki::cfg{UseLocale} = 0; $Foswiki::cfg{Site}{Locale} = 'en_US.ISO-8859-1'; $Foswiki::cfg{Site}{CharSet} = 'utf-8';

-- SatoshiYagi - 24 Dec 2013

Okay, that's actually good. Locale's are a bit experimental. I'd be more apt to expect issues with it enabled?

Are you able to see the issue on either foswiki.org, or trunk.foswiki.org using the Sandbox web. I'm not able to recreate the issue here and I use verbatim blocks frequently. So something is a bit different on your system.

Also, does the corruption occur after the initial save... ie
  • Edit new document
  • Paste in data
  • Save, and document is corrupted

Or can if you edit an existing page, For example the InstallationGuide contains verbatim blocks. If you edit and save that page does it become corrupted. v

-- GeorgeClark - 24 Dec 2013

I can't seem to reproduce the issue on foswiki.org's sandbox.

This issue doesn't happen when I create a new document with verbatim blocks in it. It happens when I modify an existing page with verbatim blocks in it via the WYSIWYG editor and save the document. The raw editor doesn't corrupt the document, either.

What's funny is that the verbatim block corruption gets worse as I save over and over. Here is how the original sample block looks like after 4 saves:
#
# /etc/fstab
# Created by anaconda on Wed Jul  3 16:39:30 2013
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#

-- SatoshiYagi - 24 Dec 2013

Looks like I was on 1.1.8 before, not 1.1.6. Let me try installing 1.1.8 on the same system and see if the problem persists there.

-- SatoshiYagi - 24 Dec 2013

Hmm, 1.1.8 is working just fine. Did anything change wrt the handling of verbatim blocks between 1.1.8 and 1.1.9?

-- SatoshiYagi - 24 Dec 2013

The TinyMCE editor uses some special characters to flag things like non-breaking spaces, etc. But I don't see 130 or 131 being used. The "verbatim" block is a foswiki pseudo-html block. The WysiwygPlugin converts it to a <pre class="TMLVerbatim"> block.

If you take a simple page with a verbatim block, click edit to start TinyMCE, and then click the "HTML" button in the tinymce toolbar, you will see how Foswiki converted the TML into HTML for TInyMCE. It might be helpful to take an uncorrupted topic, edit it and view the html being edited using the HTML button. To see if the corruption happens on the way into the editor.

If you then cancel the HTML dialog and click the WikiText button to it's right, it will perform the conversion back to TML.

-- GeorgeClark - 24 Dec 2013

We are crossing paths. Hm... There were not a lot of TInyMCE changes between 1.1.8 and 1.1.9. Actually very few, and nothing related to verbatim blocks.

There was one change in how character sets are encoded. That's a bit suspicious. distro:6cc59069487c

-- GeorgeClark - 24 Dec 2013

Can you check the source of the original page before it's corrupted. Are there any unprintables in the verbatim block that might be triggering this? If you can find a topic that will trigger the corruption, and it doesn't contain any private data, could you attach it to the task so we could try to edit the triggering topic locally?

Unfortunately the timing of this is rather poor. Most of the developers are away for the Christmas holidays.

-- GeorgeClark - 24 Dec 2013

Thanks, I can try to observe the HTML on TinyMCE.

I just installed a clean-slate 1.1.9 and the issue isn't happening there either. This means that the corruption only happens on the existing pages I've migrated over from 1.1.8.

I'll re-create the problematic environment once again and observe HTML per your suggestion.

-- SatoshiYagi - 24 Dec 2013

OK, as far as I can tell TinyMCE is displaying the HTML just fine. So the issue occurs upon saving, not during editing.

-- SatoshiYagi - 24 Dec 2013

Actually, I take back this point: "This means that the corruption only happens on the existing pages I've migrated over from 1.1.8."

This issue is happening on new pages on my installation which got documents migrated over. So perhaps something broke my installation while copying data/Main and pub/Main?

I suppose the thing to do at this point is to redo the migration and see what happens. I'll get back with the result.

-- SatoshiYagi - 24 Dec 2013

Found the problem. It was not my migrated pages. It was the setting below.

#$Foswiki::cfg{Site}{CharSet} = 'iso-8859-1';
$Foswiki::cfg{Site}{CharSet} = 'utf-8';

As soon as I turned on utf-8 on CharSet, the problem starts to happen. Phew!

-- SatoshiYagi - 24 Dec 2013

If you could reproduce that that would be great...it is consistently reproducible on my end. I use utf-8 for Japanese support, so it'd be great if it can be used.

-- SatoshiYagi - 24 Dec 2013

Ah... Use of utf-8 is a bit of a challenge. I believe there is a process to "convert" pages to utf-8. It's more complex than just enabling it. I have not worked with it. I'll set this to waiting on CrawfordCurrie, MichaelDaum and JanKrueger. I think they have worked with utf-8.

-- GeorgeClark - 24 Dec 2013

Sounds good. This is a new issue in 1.1.9, as I had used utf-8 on 1.1.8 without issues (that's why I modified the setting when I migrated to 1.1.9).

Thanks for your help! Happy holidays.

-- SatoshiYagi - 24 Dec 2013

I can confirm the error on trunk. Note that the error already manifests switching from wysiwyg to wiki-text. There's a call to the html2tml rest handler which produces the illegal encoding.

-- MichaelDaum - 24 Dec 2013

I've added this to KnownIssuesOfFoswiki01x01. We should probably get a fix into an updated WysiwygPlugin ASAP. Most of our utf-8 work is deferred for 1.2, but this is new breakage, so we need to get it fixed. Unfortunately I don't seem to be able to recreate it. I switched my local test system to utf-8 with locale's enabled, but verbatim blocks seem to survive just fine. If I could recreate it I'd try reverting distro:6cc59069487c to see if that is related. It hits the right places to be suspect.

-- GeorgeClark - 24 Dec 2013

You might want to copy the original verbatim block (fstab) I posted on this page. Not all verbatim blocks go berserk, but this one certainly does. Also this issue might be platform-dependent? I am running Fedora 20 with apache 2.4.6-6 on Perl 5.18 CGI.

-- SatoshiYagi - 24 Dec 2013

Yes, distro:6cc59069487c introduced the error. Instead of reverting, here's a hotfix that cures the problem as far as I can see. Tested on utf-8 systems only.

--- lib/Foswiki/Plugins/WysiwygPlugin/Constants.pm      (revision 17183)
+++ lib/Foswiki/Plugins/WysiwygPlugin/Constants.pm      (working copy)
@@ -328,7 +328,6 @@
         }
     }
     HTML::Entities::_decode_entities( $_[0], $representable_entities );
-    $_[0] = Encode::encode( encoding(), $_[0] );
     return $_[0];
 }

I've not tested it on non-utf-8 systems but it looks as if the extra call to Encode::encode can be spared.

-- MichaelDaum - 25 Dec 2013

Thanks. I applied the one-line change on my installation and it seems to be working as well.

-- SatoshiYagi - 26 Dec 2013

I thought that that change looked a bit like it might mask a larger problem, and GeorgeClark found it suspicious, too, so I took the opportunity to learn a little bit more about how Perl's Unicode handling works.

In short, that change will work properly when $_[0] contains a Perl byte string, but not when it contains a Perl Unicode string. In the latter case, the string will eventually be output by treating the individual Unicode code points as bytes (roughly equivalent to outputting as ISO-8859-1, plus a "Wide character in print" warning for all code points > 0xff). That's almost never the right thing to do.

Given the way we currently deal with encodings, it's fairly safe to assume that $_[0] will never be a Unicode string (and we'll have to do fairly extensive rewriting to port everything to Unicode, anyway, so there's no need to consider future-proofness). That said, if we want to be really careful, we can turn the line in question into this instead:

    $_[0] = Encode::encode( encoding(), $_[0] ) if utf8::is_utf8($_[0]);

-- JanKrueger - 03 Jan 2014

OK, I can see what's going on. It's a special case handling for verbatim blocks that I missed when I removed the others. Patch:
===================================================================
--- lib/Foswiki/Plugins/WysiwygPlugin/HTML2TML/Node.pm   (revision 17082)
+++ lib/Foswiki/Plugins/WysiwygPlugin/HTML2TML/Node.pm   (working copy)
@@ -1350,13 +1350,13 @@
 sub _verbatim {
     my ( $this, $tag, $options ) = @_;
 
-    $options |= $WC::PROTECTED | $WC::KEEP_ENTITIES | $WC::BR2NL | $WC::KEEP_WS;
+    $options |= $WC::PROTECTED;
+    # We must switch off KEEP_ENTITIES to allow _flatten to
+    # decodeRepresentableEntities
+    $options &= ~$WC::KEEP_ENTITIES;
+
     my ( $flags, $text ) = $this->_flatten($options);
 
-    # decode once, and once only. This will decode only those
-    # entities than have a representation in the site charset.
-    WC::decodeRepresentableEntities($text);
-
     # &nbsp; decodes to \240, which we want to make a space.
     $text =~ s/\240/$WC::NBSP/g;
     my $p = _htmlParams( $this->{attrs}, $options );
There are failing unit tests with this patch, that need to be analysed before it can be committed.

-- CrawfordCurrie - 07 Jan 2014

I'm totally lost within Node.pm. Debug prints of $text are always empty, but clearly changing the flags breaks the tests. This _verbatim code is called with $tag set to either 'verbatim' or 'sticky'. The only difference that I'm aware of is during Foswiki render in the normal page view. Verbatim blocks are not touched by render, where as sticky blocks are rendered normally. However during edit, they are both treated similarly.

I wonder if sticky blocks might encounter similar corruption, and it just has not been reported. (yes indeed. Update: It appears that sticky tests encounter the same corruption.) The good news though is that the entire suite finished with Site CharSet set to utf8, and it was Translator, ExtendedTranslator and the new Logger tests I just added that had difficulties.

In any event, by making your change conditional to $tag eq 'verbatim', then that fixes the sticky issues. But &nbsp is still broken in verbatim blocks.

If a user enters &nbsp, that should not be touched. I think that's true in either verbatim or sticky. User enters it, it sticks and survives roundtrips. Is the issue that decodeRepresentableEntities should leave it alone?

There are other nbsp issues - Item5103 for ex.

Totally lost now...

-- GeorgeClark - 07 Jan 2014

Anybody able to release a patch contrib?

-- MichaelDaum - 31 Jan 2014

Hi Michael,

I could easily build a patch contrib, or we could just build an updated WysiwygPlugin. But your patch and Crawford's as well both have issues.

We really should be able to get a clean run of the RoundTripTests after the patch. This also shows a flaw in our test cases, in that the tests pass if you don't use a utf8 site character-set.

There is some further discussion of the issue here on IRC

-- GeorgeClark - 31 Jan 2014

A list of things to try out that break:

  1. Fixed: verbatim blocks are corrupted
  2. %ORANGEÊn't repro: sticky blocks are corrupted (is this the issue being reported somewhere in the irc logs? ... not sure)
  3. Broken : &nbsp; are removed

Please add to the above list so that everybody can see where we are.

-- MichaelDaum - 03 Feb 2014

With {Site}{CharSet} = 'utf8', the following unit tests fail:

Module Failure summary

TranslatorTests has 20 unexpected results (of 398):

After Crawford's patch

Module Failure summary

TranslatorTests has 4 unexpected results (of 398):

-- GeorgeClark - 04 Feb 2014

Here is the sticky issue:

TranslatorTests::testROUNDTRIP_entityWithNoNameInsideSticky

==entityWithNoNameInsideSticky== Expected TML:
<sticky>&#9792;</sticky>
==entityWithNoNameInsideSticky== Actual TML:
<sticky>â</sticky>
==entityWithNoNameInsideSticky==
<sticky><<==== HERE actual 195 != expected 38

-- GeorgeClark - 04 Feb 2014

With Crawford's patch and in real life I still can't reproduce any problem with ♀ as indicated by TranslatorTests::testROUNDTRIP_entityWithNoNameInsideSticky. What I did:

  1. open the wysiwyg editor
  2. add
    <sticky>&#9792;</sticky>
  3. save
  4. open again
  5. edit some other text
  6. save
  7. open again
  8. switch to wiki text
  9. edit some other text
  10. save

Tested with SKIN = pattern, SKIN = natedit, pattern, SKIN = nat.

The sticky + html entity is preserved as expected. Not so a nacked &nbsp; These are silently removed.

-- MichaelDaum - 04 Feb 2014

It's hard to say since the TinyMCE editor might also be changing the character back into an entity. The non-working Selenium tests would be closer to what you ar doing.

You might also try to just click the WikiText button, which also drives the HTML2TML conversion, rather than going through the save cycle.

I still think that the unit tests need to pass. It pretty clearly shows that an entity inside of a stickly block does not survive a round trip, independent of the path through the browser and javascript. It's painful to debug, but to find out what is happening in the editor, I've had to add debug print's to dump the posted data from tinymce before the conversion, then put that into a unit test as the input html, and test it as a HTML2TML test instead of a ROUNDTRIP. This comes closer to capturing what the editor did to the page.

We really need something like selenium to pass the test cases through a save cycle of the JS editor.

-- GeorgeClark - 04 Feb 2014

Crawford, It looks like we're really going to need some help with your proposed patch and loss of whitespace. I've made two attempts to figure it out and failed both times. This is blocking 1.2.

-- GeorgeClark - 04 Aug 2014

Done.

-- CrawfordCurrie - 07 Nov 2014

Reopening. Wysiwyg crashes in multiple places on my system

-- GeorgeClark - 21 Nov 2014

Since I don't have access to your system, perhaps some clues as to where it crashes.... ?

-- CrawfordCurrie - 24 Nov 2014

It's fixed. Fixes from both Crawford and myself. $WC::TML_COLOURS() was not callable as a subroutine, and missing subroutine triggered for sites not running utf-8.

-- GeorgeClark - 24 Nov 2014
 

ItemTemplate edit

Summary Verbatim blocks are corrupted on save by Wysiwyg editor if $Foswiki::cfg{Site}{Locale} = 'utf-8'; (hotfix available)
ReportedBy SatoshiYagi
Codebase 1.1.9, trunk
SVN Range
AppliesTo Engine
Component I18N, WysiwygPlugin
Priority Urgent
CurrentState Closed
WaitingFor
Checkins distro:bba8feccb711 distro:649335e78692 distro:37c2a189e8a7 distro:7eea1db13a51
TargetRelease major
ReleasedIn 2.0.0
CheckinsOnBranches master
trunkCheckins
masterCheckins distro:bba8feccb711 distro:649335e78692 distro:37c2a189e8a7 distro:7eea1db13a51
ItemBranchCheckins
Release01x01Checkins
Topic revision: r33 - 05 Jul 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy