Item13446: HTML entities in Config.spec files are incorrectly converted to hex.
Priority: Normal
Current State: Closed
Released In: 1.2.0
Target Release: minor
Here are multiple problems with the
DateTimePlugin.
The plugin contains language-specific month/day names in its config.spec like
$Foswiki::cfg{Plugins}{DateTimePlugin}{Dates}{pt} = {
months_long => 'Janeiro, Fevereiro, Março, Abril, Maio, Junho, Julho, Agosto, Setembro, Outubro, Novembro, Dezembro',
months_short => 'Jan, Fev, Mar, Abr, Mai, Jun, Jul, Ago, Set, Out, Nov, Dez',
weekdays_long => 'Domingo, Segunda-feira, Terça-feira, Quarta-feira, Quinta-feira, Sexta-feira, Sábado',
weekdays_short => 'Dom, Seg, Ter, Qua, Qui, Sex, Sab',
};
note the &ccddil; and such.
It works with the above, but when i changed the values in the
/bin/configure
is stopped works and show screwed characters. So, the question is, how to deal with such "config.spec= files from the extensions?
Second: just wondering, how the user can add new language translations, without manually edit the config.spec
? The configure doesn't allows "create an new node" - e.g. for example no way create $Foswiki::cfg{Plugins}{DateTimePlugin}{Dates}{cs}
without manual config.spec
editing. Moved to Item13453
So, we can supply each language with html-entities, but the user should not modify it to utf8, or need patch the plugin.
Also, how we will deal with a situation, when the plugin descrption text e.g. the =System.Pluginname.txt" topic will be not pure-ascii? E.g. when it is written in latin1 or in utf8. In the plugin installation moment we will determine what charset is used by the plugin description and convert on-the-fly?
--
JozefMojzis - 06 Jun 2015
(Hijacked discussion of extension compatibility issues moved to
Item13452. This task describes a legitimate urgent issue with configure. It also describes a 2nd issue with config.spec for
DateTimePlugin which will also be moved to a separate task.)
Back to the original problem. Configure does indeed corrupt html entities in the
DateTimePlugin settings.
Key Old New
{Plugins}{DateTimePlugin}{Dates}{pt}
{'months_long' => 'Janeiro, Fevereiro, Março, Abril, Maio, Junho, Julho, Agosto, Setembro, Outubro, Novembro, Dezembro',
'months_short' => 'Jan, Fev, Mar, Abr, Mai, Jun, Jul, Ago, Set, Out, Nov, Dez','weekdays_long' => 'Domingo, Segunda-feira, Ter...
{'months_long' => "Janeiro, Fevereiro, Mar\x{e7}o, Abril, Maio, Junho, Julho, Agosto, Setembro, Outubro, Novembro, Dezembro",
'months_short' => 'Jan, Fev, Mar, Abr, Mai, Jun, Jul, Ago, Set, Out, Nov, Dez','weekdays_long' => "Domingo, Segunda-feira, Ter\x...
--
GeorgeClark - 09 Jun 2015
To recreate:
- Install DateTimePlugin using package installer, not pseudo install.
- Visit configure and the Extensions DateTimePlugin tab, Don't click anything.
- The "nl" strings has both a reset and an undo button.
- Note that Save button is alreay active with 1 change even though nothing was updated
- Save the change, the entities are converted to hex
diff LocalSite.cfg LocalSite.cfg.6
231c231,236
< $Foswiki::cfg{Plugins}{DateTimePlugin}{Dates}{nl} = {'months_long' => 'Januari, Februari, Maart, April, Mei, Juni, Juli, Augustus, September, Oktober, November, December','months_short' => 'Jan, Feb, Maa, Apr, Mei, Jun, Jul, Aug, Sep, Okt, Nov, Dec','weekdays_long' => 'Zondag, Maandag, Dinsdag, Woensdag, Donderdag, Vrijdag, Zaterdag','weekdays_short' => 'Zon, Maa, Din, Woe, Don, Vri, Zat'};
---
> $Foswiki::cfg{Plugins}{DateTimePlugin}{Dates}{nl} = {
> months_long => 'Januari, Februari, Maart, April, Mei, Juni, Juli, Augustus, September, Oktober, November, December',
> months_short => 'Jan, Feb, Maa, Apr, Mei, Jun, Jul, Aug, Sep, Okt, Nov, Dec',
> weekdays_long => 'Zondag, Maandag, Dinsdag, Woensdag, Donderdag, Vrijdag, Zaterdag',
> weekdays_short => 'Zon, Maa, Din, Woe, Don, Vri, Zat',
> };
233,238c238,242
< 'months_long' => 'Janeiro, Fevereiro, Mar�o, Abril, Maio, Junho, Julho, Agosto, Setembro, Outubro, Novembro, Dezembro',
< 'months_short' => 'Jan, Fev, Mar, Abr, Mai, Jun, Jul, Ago, Set, Out, Nov, Dez',
< 'weekdays_long' => 'Domingo, Segunda-feira, Ter�a-feira, Quarta-feira, Quinta-feira, Sexta-feira, S�bado',
< 'weekdays_short' => 'Dom, Seg, Ter, Qua, Qui, Sex, Sab'
< }
< ;
---
> months_long => 'Janeiro, Fevereiro, Março, Abril, Maio, Junho, Julho, Agosto, Setembro, Outubro, Novembro, Dezembro',
> months_short => 'Jan, Fev, Mar, Abr, Mai, Jun, Jul, Ago, Set, Out, Nov, Dez',
> weekdays_long => 'Domingo, Segunda-feira, Terça-feira, Quarta-feira, Quinta-feira, Sexta-feira, Sábado',
> weekdays_short => 'Dom, Seg, Ter, Qua, Qui, Sex, Sab',
> };
--
GeorgeClark - 09 Jun 2015
Browser expands entities in text embedded in a textarea. Must pre-escape entities.
--
Main.CrawfordCurrie - 09 Jun 2015 - 16:49
After talking with
JozefMojzis, there is more to this than that. I missed this point from his original report. If you
enter characters into the ui, they also get converted to hex. The example he used is setting {TrashWebName} to 'Kôš'. From Configure it gets saved as "K\x{f4}\x{161}"; Jomo reports that this does work, so that's good news.
2015-06-09T16:00:46-04:00 notice |
admin |
127.0.0.1 |
{TrashWebName} |
'Trash' |
"K\x{f4}\x{161}" |
If you do the same thing from the cli, it gets set without encoding: ie:
tools/configure -save -set {TrashWebName}='Kôš'
New configuration saved in /var/www/foswiki/distro/core/lib/LocalSite.cfg
| *Key* | *Old* | *New* |
| {TrashWebName} | 'Trash' | 'Kôš' |
The log is incorrect as well:
2015-06-09T17:03:32-04:00 notice |
|
|
{TrashWebName} |
'Trash' |
'Kôš' |
If the answer is always use entities, then we probably need some checks to prevent corruption of the config. I can see it being a reasonable request to use localized web names.
--
GeorgeClark - 09 Jun 2015
From the IRC:
CDot: yeah, but that then requires utf8-encoding LSC, and at the moment it's ASCII, which I kinda like
I'm simply unable understand such point of view.
The only thing is needed
- when the
configure
saves the LSC under the the $Foswiki::UNICODE = true
- should add to the top of the LSC:
use utf8;
;
Thats all, perl is smart enough to know, what should do when
require LocalSite.cfg;
Here
isn't any reason to treat LSC as
ascii only
.
- the LSC is generated at the bootstrap
- so, it is specific to the current installation
- for the ASCII-peoples here isn't any difference - proof: save the following scripts a try yourself
#file utest.cfg
use utf8;
$X{some} = "some";
$X{ukey} = "ščť";
$X{ščť} = "ščťval";
#file utest
use strict;
use warnings;
use feature "say";
binmode STDOUT, ":utf8";
our %X;
require "utest.cfg";
say join "\t", qw(key value k_utf8 v_utf8);
for my $k (qw(some ukey), $X{ukey}) {
say join "\t", $k, $X{$k}, utf8::is_utf8($k)+0, utf8::is_utf8($X{$k})+0;
}
When you run the above you will get:
key |
value |
k_utf8 |
v_utf8 |
some |
some |
0 |
0 |
ukey |
ščť |
0 |
1 |
ščť |
ščťval |
1 |
1 |
As you can see, the
utf8-flag
is not set for the ascii values (nor for the keys),
even if the LSC contains the
use utf8;
. So,
- installations what uses ASCII only LSC, changes nothing (simply nothing)
- installations what uses "utf8" allows to have utf8 strings in the LSC (with no effect for the ASCII only installations)
- installations what trying to use
iso1
in the UNICODE core is wrong anyway and will result bad encodings.
Why again add some unreasonable exceptions from the "utf8" rule, especially in the CORE CODE? (the LSC
is perl code).
So Crawford,
please could you add here some scenario, when
- is needed to enforce the ascii only LSC?
- when and what scenario could be affected with the utf8 LSC?
--
JozefMojzis - 10 Jun 2015
I did a quick scan, where you can relatively commonly expect
utf8
encoded characters in the LSC.
{SandboxWebName}
{TrashWebName}
{UsersWebName}
{AuthRealm}
{Email}{SmimeCertL}
{Email}{SmimeCertO}
{Email}{SmimeCertOU}
especially for the CSR, need be compatible with such
openssl req -new -utf8 -newkey rsa:2048 -nodes -out the.csr -keyout the.private.key -subj "/C=SK/ST=Južné Slovensko/L=Fialôčka v tôni/O=Čača spoločnosť/OU=Centrála/CN=caca.sk/emailAddress=admin@caca.sk"
note the
-utf8
Of course, nothing is broken with the current state - e.g. when the characters are stored in the current
\x{nnn}
form,
it will work ok.
(just you
should expect than the users
will edit the LSC file manually and probably will not use the \x{nnn} sequences for the international characters. (Regardless of the recommended:
use the tools/configure from the command line.)
The proposed:
use utf8;
at the top of the LSC, solves the problems. (until someone didn't add here some relevant scenario, when it could broke things).
--
JozefMojzis - 10 Jun 2015
I said I kinda liked it; not that I cared very much! The only reason for keeping it ascii is to avoid having to decode when reading/writing it. If you are going to add
use utf8
then there are quite a few places where you have to pay attention to that - Foswiki::Configure::Wizards::Save, Foswiki::Configure::Load, pseudo-install, possibly more that I haven't thought of. Feel free to go ahead and change it.
--
CrawfordCurrie - 10 Jun 2015
The current behaviour is "strange" - inconsistent web-interface vs command-line configure. It (probably) needs to be fixed.
If you want keep the LSC ascii only - ok, leave it as it is. It is your decision (and the bugs will reappear later), because the
tools/configure
didn't decodes its arguments. (Or maybe will
not reappear, because nobody will setup for example utf8 CSR or nobody will change some web-names to localised ones using the
tools/configure
, and we can live with it many years happily. )
You know, i'm not a core developer and probably never will be. Also, I do not have a rights, nor do not want add any plus tasks to anyone.
I only reported a bug and argued for the solution style - just because foreseeing problems - because i'm living with utf8 in every my perl/bash script at daily basis - because of my language. Of course, your point of view could be different and also I do not must understand everything too.
Finally, i will test and report things (not necessary bugs) what I found. I could help (happily) with testing if someone tell me, what i should test and what i should report. And also, could point to the solution method - because of experience - if want. Or not - that is ok either.
So - "No action required" for this one.
--
JozefMojzis - 10 Jun 2015
I'm changing this back to confirmed. I think we do need to address this someday, maybe not for 1.2.0. Also changing it to an enhancement.
--
GeorgeClark - 11 Jun 2015
I tried implementing this, and I think for now we need to stick with ASCII. From what I can tell, it's more than just "use utf8" in the file. We set values into the configure hash using "eval", for example:
eval("\$Foswiki::cfg$k=\$v");
Until perl 5.16, the eval of utf8 data is unpredictable. I suspect we would need the
UNICODE eval
feature to make this work. from the perldoc on eval:
In the absence of the unicode_eval feature, the string will sometimes be treated as characters and sometimes as bytes, depending on the internal encoding, and source filters activated within the eval ...
--
GeorgeClark - 11 Jun 2015
hm about what we talking? About
- Could the
%Foswiki::cfg
contain wide chars?,
- or about the encoding of the LSC?
Two, absolutely different things.
LSC !!= %Foswiki::cfg
- %Foswiki::cfg - is a hash in the memory
- LSC - serialized format (as perl source code) in the file...
e.g When CDot want keep LSC ascii only that means:
- nobody could use wide characters for the TrashWebName and so on ?
- or means, the LSC need to be encoded as pure ascii ?
Maybe again once i'm not clear, so look to the following entry in the LSC:
$Foswiki::cfg{TrashWebName} = "K\x{f4}\x{161}";
It is:
- the LSC file itself is pure ASCII-file (every above character are ascii characters, e.g. backslash, opening brace etc...)
- but the in-memory $Foswiki::cfg{TrashWebName} contains unicode string-value: Kôš. (U+0004B U+000F4 U+00161) or if you want
\x{4b}\x{f4}\x{161}
In this case:
use utf8;
$Foswiki::cfg{TrashWebName} = "Kôš";
- the in-memory $Foswiki::cfg{TrashWebName} contains unicode string-value: Kôš. (as in previous example)
- the LSC contains utf8 encoded string (
\x{4b}\x{c3}\x{b4}\x{c5}\x{a1}
)
In this case:
#use utf8; #no pragma
$Foswiki::cfg{TrashWebName} = "Kôš";
- the in-memory $Foswiki::cfg{TrashWebName} contains some garbage (
\x{4b}\x{c3}\x{b4}\x{c5}\x{a1}
)
- the LSC contains utf8 encoded string (
\x{4b}\x{c3}\x{b4}\x{c5}\x{a1}
)
So?
Also:
- The
feature unicode_eval
ignores any use utf8;
because the use utf8;
pragma has meaning only for byte oriented strings.
- Exactly the
do 'LocalSite.cfg'
is roughly the same as eval qx(cat LocalSite.cfg)
, and because the files on the HDD always contains bytes - it will work ok.
In short, the
unicode_eval
has nothing with the LSC's encoding, and the
use utf8;
only allows to have utf8 encoded LSC. Nothing more, nothing less.
--
JozefMojzis - 11 Jun 2015