You are here: Foswiki>Tasks Web>Item13876 (12 Apr 2016, GeorgeClark)Edit Attach

Item13876: Var BB (bullet line) must be deprecated and change to BB1 for utf8.

Priority: Normal
Current State: Confirmed
Released In: n/a
Target Release: n/a
Applies To: Engine
Component: I18N
Reported By: StanleyTweedle
Waiting For:
Last Change By: GeorgeClark
because %BB is utf8 url encoded symbol and some url on page will be broken after rendering if it have %BB%.

-- StanleyTweedle - 29 Nov 2015

In Item13874 we've proposed eliminating the entity encoding of [[http:...]] style links. I think that would also resolve this issue, at least if I understand it. There seems to be no issue with directly generating links with unicode characters. Modern browsers correctly encode the non-ascii characters. It turns out that the TinyMCE inserts them un-encoded and they seem to work fine.

Could you give a bit more information about the actual scenario? Is it:
  • Upload UNICODE filename as attachment
  • Insert link to attachment, which causes non ascii characters to be encoded
  • View page and %BB%xx is rendered in the link breaking it?

If this isn't it, how exactly is the URL created. Could you provide a more detailed. example?

-- GeorgeClark - 30 Nov 2015

Ok this symbol »

if in some cases it will be coded in
we have a problem

-- StanleyTweedle - 30 Nov 2015

The issue with your solution is that the character BB is only one instance of what could be many collisions. We would need to deprecate macros matching /^[A-F][A-Fa-f0-9]$/

What I still don't get is how that URL encoded text ends up in the topic to be rendered. Is that something that a user entered manually? Something we generated? Something a user pasted into the topic?

-- GeorgeClark - 30 Nov 2015

In our release meeting we've been discussing this a bit more. Another option might be to implement a <nomacro> ... <\nomacro> zone and block all macro expansion. So if a user is pasting in links from other sources that are already encoded, they would need to add the

-- GeorgeClark - 30 Nov 2015

We would need to deprecate macros matching /^[A-F][A-Fa-f0-9]$/

for utf-8it must be something like [:upper:] [:lower:] [:digit:] ))) but i think regex does not affect in this case.

i not know how and when on page url will be encoded to %NN%NN% style. but it happens (last i see in RackPlannerPlugin for utf8. for this case i will open task later).

-- StanleyTweedle - 30 Nov 2015

Okay thanks. Really there should be no need for extensions to insert encoded utf-8 now that Foswiki has a UNICODE core. So the bug is against RackPlannerPlugin. It needs updating for Foswiki 2.0.

By the way, you can locally redefine BB by overriding it in your SitePreferences. Those "convenience" macros are not generated in code. Regardless, you point out an issue that will exist for cut/paste of URLs, as well as for now, attachment uploads of utf8 names. We'll work on some way to resolve those cases.

-- GeorgeClark - 30 Nov 2015

already done it

if I understand correctly locale dont set to LC_ALL by default. And regex not change to unicode [:upper:] [:lower:] [:digit:] everywhere?

-- StanleyTweedle - 01 Dec 2015

Most of the regexes have indeed been changed to [:upper:] [:lower:] [:digit:]. However some parts of Foswiki are still explicitly defined as ASCII A-Za-z0-9, and that is the case with the regexes that match %MACROS%, Template names also have ascii restriction, though we are investigating changing that for 2.1.

-- GeorgeClark - 02 Dec 2015

ItemTemplate edit

Summary Var BB (bullet line) must be deprecated and change to BB1 for utf8.
ReportedBy StanleyTweedle
Codebase 2.0.3, trunk
SVN Range
AppliesTo Engine
Component I18N
Priority Normal
CurrentState Confirmed
TargetRelease n/a
ReleasedIn n/a
Topic revision: r10 - 12 Apr 2016, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy