You are here: Foswiki>Tasks Web>Item13997 (31 Jan 2018, GeorgeClark)Edit Attach

Item13997: Incorrect assumption about encodings in Foswiki::Store.

pencil
Priority: Normal
Current State: Closed
Released In: 2.1.1
Target Release: patch
Applies To: Engine
Component: FoswikiStore
Branches: master Release02x01 Item13897 Item14033 Item14380 Item14537
Reported By: VadimBelman
Waiting For:
Last Change By: GeorgeClark
Present Foswiki::Store implementation does one very incorrect assumption that encoding of the files is the encoding of filenames. In other words, $Foswiki::cfg{Store}{Encoding} is applied to both filename and file content. While presumably been tolerated on most of the OSes on OS X this assumption produces pretty strange result when the encoding is iso8859-1: file and directory names are converted into %FF URL-encoding. When file name is long enough (for all non-ASCII symbols it would be 86+ symbols) it gets three times longer after conversion and causes 'File name too long" error upon file/dir creation.

Here is a demo. The following script:

#!env perl

use v5.14;
use utf8;
use strict;
use warnings;
use Encode;
use File::Path;

my $s =
  Encode::decode( 'iso-8859-1', join( '', map { chr($_) } ( 160 .. 244 ) ) );
my $n = Encode::encode( 'iso-8859-1', $s, Encode::FB_CROAK );
my $tempdir = "$ENV{HOME}/tmp/foswiki.del.me/$n";
File::Path::mkpath( $tempdir, 0, 0777 );

exit;

generates the following dir name:

$ ls ~/tmp/foswiki.del.me
%A0%A1%A2%A3%A4%A5%A6%A7%A8%A9%AA%AB%AC%AD%AE%AF%B0%B1%B2%B3%B4%B5%B6%B7%B8%B9%BA%BB%BC%BD%BE%BF%C0%C1%C2%C3%C4%C5%C6%C7%C8%C9%CA%CB%CC%CD%CE%CF%D0%D1%D2%D3%D4%D5%D6%D7%D8%D9%DA%DB%DC%DD%DE%DF%E0%E1%E2%E3%E4%E5%E6%E7%E8%E9%EA%EB%EC%ED%EE%EF%F0%F1%F2%F3%F4

Obviously, increasing the top range boundary to 245 will result in 'File name too long' because it would produce 86 symbol dir name.

Proposed solution

In addition to the Encoding configuration key FilenameEncoding should be introduced. It would default to Encoding unless set manually to a different value. The Foswiki::Store::encode would get one more optional parameter to define the key to be used and may look like:

sub encode {
    return $_[0] unless defined $_[0];
    my $s = $_[0];
    my $encKey = $_[2] || 'Encoding';
    if ( $_[1] ) {
        return Encode::encode( $Foswiki::cfg{Store}{$encKey} || 'utf-8',
            $s, Encode::FB_CROAK );
    }
    else {
        return Encode::encode( $Foswiki::cfg{Store}{$encKey} || 'utf-8',
            $s, sub { HTML::Entities::encode_entities( chr(shift) ) } );
    }
}

Foswiki::Store::decode would get similar adaptation, of course. Foswiki::Store::PlainFile::_mkPathTo would have to call it in the following way:


# Make all directories above the path
sub _mkPathTo {
    my $file = _encode( shift, 1, 'FilenameEncoding' );

    ASSERT( File::Spec->file_name_is_absolute($file), $file ) if DEBUG;

...

}

Same change would be required for the numerous other calls to _encode all across the PlainFile.pm module.

-- VadimBelman - 29 Feb 2016

Brief IRC brainstorming generated a solution of blocking iso8859 on OSX.

-- VadimBelman - 29 Feb 2016

I just do not understand why someone on OS X would even try to use iso-8859. The OS X is fully unicode by default (and moreover the filesystem is enforced) - so using iso1 on OS X is the same mistake as trying to use for example ASCII-7 on Linux with accented characters.. (Or i don't understand something...)

-- JozefMojzis - 29 Feb 2016

I would think of only possible scenario of migrating from another system. But then again – if brave enough of taking the moving venture then make encoding conversion be part of it.

-- VadimBelman - 29 Feb 2016

Added a test in the Store Encoding checker. Errors unless encoding is utf-8 / utf8.

-- GeorgeClark - 29 Feb 2016
 
Topic revision: r11 - 31 Jan 2018, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy