You are here: Foswiki>Tasks Web>Item13446 (11 Jun 2015, JozefMojzis)Edit Attach

Item13446: HTML entities in Config.spec files are incorrectly converted to hex.

pencil
Priority: Normal
Current State: Closed
Released In: 1.2.0
Target Release: minor
Applies To: Engine
Component: BuildContrib, DateTimePlugin, PseudoInstall, configure
Branches: master
Reported By: JozefMojzis
Waiting For:
Last Change By: JozefMojzis
Here are multiple problems with the DateTimePlugin.

The plugin contains language-specific month/day names in its config.spec like

$Foswiki::cfg{Plugins}{DateTimePlugin}{Dates}{pt} = {
   months_long => 'Janeiro, Fevereiro, Março, Abril, Maio, Junho, Julho, Agosto, Setembro, Outubro, Novembro, Dezembro',
   months_short => 'Jan, Fev, Mar, Abr, Mai, Jun, Jul, Ago, Set, Out, Nov, Dez',
   weekdays_long => 'Domingo, Segunda-feira, Terça-feira, Quarta-feira, Quinta-feira, Sexta-feira, Sábado',
   weekdays_short => 'Dom, Seg, Ter, Qua, Qui, Sex, Sab',
};
note the &ccddil; and such.

It works with the above, but when i changed the values in the /bin/configure is stopped works and show screwed characters. So, the question is, how to deal with such "config.spec= files from the extensions?

Second: just wondering, how the user can add new language translations, without manually edit the config.spec? The configure doesn't allows "create an new node" - e.g. for example no way create $Foswiki::cfg{Plugins}{DateTimePlugin}{Dates}{cs} without manual config.spec editing. Moved to Item13453

So, we can supply each language with html-entities, but the user should not modify it to utf8, or need patch the plugin.

Also, how we will deal with a situation, when the plugin descrption text e.g. the =System.Pluginname.txt" topic will be not pure-ascii? E.g. when it is written in latin1 or in utf8. In the plugin installation moment we will determine what charset is used by the plugin description and convert on-the-fly?

-- JozefMojzis - 06 Jun 2015

(Hijacked discussion of extension compatibility issues moved to Item13452. This task describes a legitimate urgent issue with configure. It also describes a 2nd issue with config.spec for DateTimePlugin which will also be moved to a separate task.)

Back to the original problem. Configure does indeed corrupt html entities in the DateTimePlugin settings.
Key    Old    New
{Plugins}{DateTimePlugin}{Dates}{pt}    

{'months_long' => 'Janeiro, Fevereiro, Março, Abril, Maio, Junho, Julho, Agosto, Setembro, Outubro, Novembro, Dezembro',
'months_short' => 'Jan, Fev, Mar, Abr, Mai, Jun, Jul, Ago, Set, Out, Nov, Dez','weekdays_long' => 'Domingo, Segunda-feira, Ter...    

{'months_long' => "Janeiro, Fevereiro, Mar\x{e7}o, Abril, Maio, Junho, Julho, Agosto, Setembro, Outubro, Novembro, Dezembro",
'months_short' => 'Jan, Fev, Mar, Abr, Mai, Jun, Jul, Ago, Set, Out, Nov, Dez','weekdays_long' => "Domingo, Segunda-feira, Ter\x...

-- GeorgeClark - 09 Jun 2015

To recreate:
  • Install DateTimePlugin using package installer, not pseudo install.
  • Visit configure and the Extensions DateTimePlugin tab, Don't click anything.
    • The "nl" strings has both a reset and an undo button.
  • Note that Save button is alreay active with 1 change even though nothing was updated
  • Save the change, the entities are converted to hex

 diff LocalSite.cfg LocalSite.cfg.6
231c231,236
< $Foswiki::cfg{Plugins}{DateTimePlugin}{Dates}{nl} = {'months_long' => 'Januari, Februari, Maart, April, Mei, Juni, Juli, Augustus, September, Oktober, November, December','months_short' => 'Jan, Feb, Maa, Apr, Mei, Jun, Jul, Aug, Sep, Okt, Nov, Dec','weekdays_long' => 'Zondag, Maandag, Dinsdag, Woensdag, Donderdag, Vrijdag, Zaterdag','weekdays_short' => 'Zon, Maa, Din, Woe, Don, Vri, Zat'};
---
> $Foswiki::cfg{Plugins}{DateTimePlugin}{Dates}{nl} = {
>       months_long => 'Januari, Februari, Maart, April, Mei, Juni, Juli, Augustus, September, Oktober, November, December',
>       months_short => 'Jan, Feb, Maa, Apr, Mei, Jun, Jul, Aug, Sep, Okt, Nov, Dec',
>       weekdays_long => 'Zondag, Maandag, Dinsdag, Woensdag, Donderdag, Vrijdag, Zaterdag',
>       weekdays_short => 'Zon, Maa, Din, Woe, Don, Vri, Zat',
> };
233,238c238,242
<   'months_long' => 'Janeiro, Fevereiro, Mar�o, Abril, Maio, Junho, Julho, Agosto, Setembro, Outubro, Novembro, Dezembro',
<   'months_short' => 'Jan, Fev, Mar, Abr, Mai, Jun, Jul, Ago, Set, Out, Nov, Dez',
<   'weekdays_long' => 'Domingo, Segunda-feira, Ter�a-feira, Quarta-feira, Quinta-feira, Sexta-feira, S�bado',
<   'weekdays_short' => 'Dom, Seg, Ter, Qua, Qui, Sex, Sab'
< }
< ;
---
>       months_long => 'Janeiro, Fevereiro, Mar&ccedil;o, Abril, Maio, Junho, Julho, Agosto, Setembro, Outubro, Novembro, Dezembro',
>       months_short => 'Jan, Fev, Mar, Abr, Mai, Jun, Jul, Ago, Set, Out, Nov, Dez',
>       weekdays_long => 'Domingo, Segunda-feira, Ter&ccedil;a-feira, Quarta-feira, Quinta-feira, Sexta-feira, S&aacute;bado',
>       weekdays_short => 'Dom, Seg, Ter, Qua, Qui, Sex, Sab',
> };

-- GeorgeClark - 09 Jun 2015

Browser expands entities in text embedded in a textarea. Must pre-escape entities.

-- Main.CrawfordCurrie - 09 Jun 2015 - 16:49

After talking with JozefMojzis, there is more to this than that. I missed this point from his original report. If you enter characters into the ui, they also get converted to hex. The example he used is setting {TrashWebName} to 'Kôš'. From Configure it gets saved as "K\x{f4}\x{161}"; Jomo reports that this does work, so that's good news.

2015-06-09T16:00:46-04:00 notice admin 127.0.0.1 {TrashWebName} 'Trash' "K\x{f4}\x{161}"

If you do the same thing from the cli, it gets set without encoding: ie:

 tools/configure -save -set {TrashWebName}='Kôš'
New configuration saved in /var/www/foswiki/distro/core/lib/LocalSite.cfg
| *Key* | *Old* | *New* |
| {TrashWebName} | 'Trash' | 'Kôš' |

The log is incorrect as well:

2015-06-09T17:03:32-04:00 notice     {TrashWebName} 'Trash' 'Kôš'

If the answer is always use entities, then we probably need some checks to prevent corruption of the config. I can see it being a reasonable request to use localized web names.

-- GeorgeClark - 09 Jun 2015

From the IRC:
CDot: yeah, but that then requires utf8-encoding LSC, and at the moment it's ASCII, which I kinda like   
I'm simply unable understand such point of view.

The only thing is needed
  • when the configure saves the LSC under the the $Foswiki::UNICODE = true
  • should add to the top of the LSC: use utf8;;
Thats all, perl is smart enough to know, what should do when require LocalSite.cfg;

Here isn't any reason to treat LSC as ascii only.
  • the LSC is generated at the bootstrap
  • so, it is specific to the current installation
  • for the ASCII-peoples here isn't any difference - proof: save the following scripts a try yourself
#file utest.cfg
use utf8;
$X{some} = "some";
$X{ukey} = "ščť";
$X{ščť} = "ščťval";

#file utest
use strict;
use warnings;
use feature "say";

binmode STDOUT, ":utf8";

our %X;
require "utest.cfg";

say join "\t", qw(key value k_utf8 v_utf8);
for my $k (qw(some ukey), $X{ukey}) {
   say join "\t", $k, $X{$k}, utf8::is_utf8($k)+0, utf8::is_utf8($X{$k})+0;
}

When you run the above you will get:
key value k_utf8 v_utf8
some some 0 0
ukey ščť 0 1
ščť ščťval 1 1

As you can see, the utf8-flag is not set for the ascii values (nor for the keys), even if the LSC contains the use utf8; . So,
  • installations what uses ASCII only LSC, changes nothing (simply nothing)
  • installations what uses "utf8" allows to have utf8 strings in the LSC (with no effect for the ASCII only installations)
  • installations what trying to use iso1 in the UNICODE core is wrong anyway and will result bad encodings.

Why again add some unreasonable exceptions from the "utf8" rule, especially in the CORE CODE? (the LSC is perl code).

So Crawford, please could you add here some scenario, when
  • is needed to enforce the ascii only LSC?
  • when and what scenario could be affected with the utf8 LSC?

-- JozefMojzis - 10 Jun 2015

I did a quick scan, where you can relatively commonly expect utf8 encoded characters in the LSC.
{SandboxWebName}
{TrashWebName}
{UsersWebName}
{AuthRealm}
{Email}{SmimeCertL}
{Email}{SmimeCertO}
{Email}{SmimeCertOU}
especially for the CSR, need be compatible with such
openssl req -new -utf8 -newkey rsa:2048 -nodes -out the.csr -keyout the.private.key -subj "/C=SK/ST=Južné Slovensko/L=Fialôčka v tôni/O=Čača spoločnosť/OU=Centrála/CN=caca.sk/emailAddress=admin@caca.sk"
note the -utf8

Of course, nothing is broken with the current state - e.g. when the characters are stored in the current \x{nnn} form, it will work ok.

(just you should expect than the users will edit the LSC file manually and probably will not use the \x{nnn} sequences for the international characters. (Regardless of the recommended: use the tools/configure from the command line.) smile

The proposed: use utf8; at the top of the LSC, solves the problems. (until someone didn't add here some relevant scenario, when it could broke things).

-- JozefMojzis - 10 Jun 2015

I said I kinda liked it; not that I cared very much! The only reason for keeping it ascii is to avoid having to decode when reading/writing it. If you are going to add use utf8 then there are quite a few places where you have to pay attention to that - Foswiki::Configure::Wizards::Save, Foswiki::Configure::Load, pseudo-install, possibly more that I haven't thought of. Feel free to go ahead and change it.

-- CrawfordCurrie - 10 Jun 2015

The current behaviour is "strange" - inconsistent web-interface vs command-line configure. It (probably) needs to be fixed.

If you want keep the LSC ascii only - ok, leave it as it is. It is your decision (and the bugs will reappear later), because the tools/configure didn't decodes its arguments. (Or maybe will not reappear, because nobody will setup for example utf8 CSR or nobody will change some web-names to localised ones using the tools/configure , and we can live with it many years happily. ) smile

You know, i'm not a core developer and probably never will be. Also, I do not have a rights, nor do not want add any plus tasks to anyone.

I only reported a bug and argued for the solution style - just because foreseeing problems - because i'm living with utf8 in every my perl/bash script at daily basis - because of my language. Of course, your point of view could be different and also I do not must understand everything too. wink smile

Finally, i will test and report things (not necessary bugs) what I found. I could help (happily) with testing if someone tell me, what i should test and what i should report. And also, could point to the solution method - because of experience - if want. Or not - that is ok either.

So - "No action required" for this one. smile

-- JozefMojzis - 10 Jun 2015

I'm changing this back to confirmed. I think we do need to address this someday, maybe not for 1.2.0. Also changing it to an enhancement.

-- GeorgeClark - 11 Jun 2015

I tried implementing this, and I think for now we need to stick with ASCII. From what I can tell, it's more than just "use utf8" in the file. We set values into the configure hash using "eval", for example:
  eval("\$Foswiki::cfg$k=\$v");

Until perl 5.16, the eval of utf8 data is unpredictable. I suspect we would need the UNICODE eval feature to make this work. from the perldoc on eval:
In the absence of the unicode_eval feature, the string will sometimes be treated as characters and sometimes as bytes, depending on the internal encoding, and source filters activated within the eval ...

-- GeorgeClark - 11 Jun 2015

hm about what we talking? About
  1. Could the %Foswiki::cfg contain wide chars?,
  2. or about the encoding of the LSC?
Two, absolutely different things.

LSC !!= %Foswiki::cfg
  • %Foswiki::cfg - is a hash in the memory
  • LSC - serialized format (as perl source code) in the file...

e.g When CDot want keep LSC ascii only that means:
  • nobody could use wide characters for the TrashWebName and so on ?
  • or means, the LSC need to be encoded as pure ascii ?

Maybe again once i'm not clear, so look to the following entry in the LSC:

$Foswiki::cfg{TrashWebName} = "K\x{f4}\x{161}";
It is:
  • the LSC file itself is pure ASCII-file (every above character are ascii characters, e.g. backslash, opening brace etc...)
  • but the in-memory $Foswiki::cfg{TrashWebName} contains unicode string-value: Kôš. (U+0004B U+000F4 U+00161) or if you want \x{4b}\x{f4}\x{161}

In this case:
use utf8;
$Foswiki::cfg{TrashWebName} = "Kôš";
  • the in-memory $Foswiki::cfg{TrashWebName} contains unicode string-value: Kôš. (as in previous example)
  • the LSC contains utf8 encoded string ( \x{4b}\x{c3}\x{b4}\x{c5}\x{a1} )

In this case:
#use utf8; #no pragma
$Foswiki::cfg{TrashWebName} = "Kôš";
  • the in-memory $Foswiki::cfg{TrashWebName} contains some garbage ( \x{4b}\x{c3}\x{b4}\x{c5}\x{a1} )
  • the LSC contains utf8 encoded string ( \x{4b}\x{c3}\x{b4}\x{c5}\x{a1} )

So?

Also:
  • The feature unicode_eval ignores any use utf8; because the use utf8; pragma has meaning only for byte oriented strings.
  • Exactly the do 'LocalSite.cfg' is roughly the same as eval qx(cat LocalSite.cfg), and because the files on the HDD always contains bytes - it will work ok.

In short, the unicode_eval has nothing with the LSC's encoding, and the use utf8; only allows to have utf8 encoded LSC. Nothing more, nothing less.

-- JozefMojzis - 11 Jun 2015
 

ItemTemplate edit

Summary HTML entities in Config.spec files are incorrectly converted to hex.
ReportedBy JozefMojzis
Codebase trunk
SVN Range
AppliesTo Engine
Component BuildContrib, DateTimePlugin, PseudoInstall, configure
Priority Normal
CurrentState Closed
WaitingFor
Checkins distro:5e45ceee0a41
TargetRelease minor
ReleasedIn 1.2.0
CheckinsOnBranches master
trunkCheckins
masterCheckins distro:5e45ceee0a41
ItemBranchCheckins
Release01x01Checkins
Topic revision: r15 - 11 Jun 2015, JozefMojzis
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy