Feature Proposal: The best perl version for implementing UTF8

Motivation

This proposal is in support of UseUTF8 and UnicodeSupport

It appears that Perl 5.14 may be the preferred version for support of UTF8 and Unicode. This needs to be discussed and determine if and when to move trunk to a newer version of perl better suited for UTF8. It would probably better be a brainstorming topic.

Description and Documentation

  • Which version of perl will be needed for successful UTF8 and Unicode support:
  • What is timing of a move to this version:
  • Can the minimum version of perl foswiki supports be smaller than the version recomended for functioning UTF8, or will it require code that cannot function on prior versions? (with non-UTF8 caveats as we have now)
As Michael very correctly pointed out on IRC: We all agree that anyone working on UTF8 support should not waste time trying to make UTF8 work on older perls

Impact

Potentially major compatibility issues with older or more conservative distributions, especially RHEL 5.

Implementation

-- Contributors:

​Moved from off-topic discussion in RequirePerl588

Discussion

I presume the main issue will be if active users run distributions that still use / require

to make it simpler - Perl release history -

  • 5.6.2 Nov 2003
  • 5.8.8 was released in January 2006.
  • 5.10.0 in dec 2007
  • 5.10.1 in Aug 2009
  • 5.12 in May 2010
  • 5.14.0 was in May 2011
  • 5.14.1
  • 5.16.0 May 2012

-- SvenDowideit - 11 Nov 2011

It'll be interesting to see how well 5.8.8 does. I've been fighting double encoding issues in 5.10 that go away by using 5.12 as it is, but I guess that's more likely a module version problem.

-- PaulHarvey - 11 Nov 2011

http://blog.timbunce.org/2011/07/21/upgrading-from-perl-5-8/ is useful

mind you, I do wonder if we cna goto 5.10 or even 5.12.

-- SvenDowideit - 21 May 2012

For use utf8; IMO is not a reason to switch, because use utf8; is only for one purpose: telling perl than in the source code are utf8 characters. Because in FW nowhere exists things like $str =~ /áéí/ or $váríáble = 10 the use utf8 is pointless. It is not needed for processing utf8 datafiles; For the conversions, like: utf8::encode($string) is better to use Encode anyway. see: http://perldoc.perl.org/utf8.html

But, if FW really want go to unicode, soon will need 5.14 (or at least 5.12 with great care). For now internally FW uses horrible octets - so :(. According to http://www.perl.com/pub/2011/06/new-features-of-perl-514-unicode-strings.html the only correct unicode implementation is in 5.14.

All you argue than here is some OSes what default ship 5.8. Yes, this is true for Solaris and AIX. Sure you know much better than me, how much FW installations uses those OSes. But FreeBSD has default 5.12, OS X (Lion) 5.12, Linux (ofc 5.12 and 5.14), Windows (can install 5.14). And for Solaris and AIX here is a possibility build 5.12 using perlbrew.

So, hurry to 5.12.

-- JozefMojzis - 21 May 2012

See also Item11869 for another reason to get to a more recent perl. Anything older than 5.12.4 has trouble with older dates.

-- GeorgeClark - 22 May 2012

For anyone interesting - http://foswiki.org/Sandbox/PerlVersionsByDistros - Graphical show perl versions for major distros...

-- JozefMojzis - 22 May 2012

Jozef, that is a depressing graph. - thankyou for making it though, as it makes it clear that RHEL based stuff is the problem. (Oracle is making sure that Solaris is really irrelevant)

-- SvenDowideit - 23 May 2012

What seems clear is that we are safe raising the bar to 5.10.

Raising it to 5.12 - as required to be good unicodians - drops out debian, oracle, suse, centos, turbolinux and yellow-dog. That's 30% of all listed, albeit some are quite exotic. Who cares about oracle, turbolinux and yellow-dog. The latter is pretty much lost in space. Kubuntu being redundant as it equals the mainstream ubuntu in the relevant bits.

Taking those out of the equation, we are left with debian (squeeze), suse (sles 11) and centos not shipping perl 5.12 or above. That's 15.8% in a cleaned up statistics.

With regards to debian, these numbers are for debian/squeeze (aka debian/stable): hyper conservative. debian/wheezly is a much better choice to build up a server, and there you get 5.14.2, which is pretty okay.

sles 12 is about to be released pretty soon this year which will adopt a somewhat more modern perl as can be found in recent opensuses. Both, debian/stable and sles, are well know to be very slow adopting upstream changes with sles users being quite unhappy with this situation.

So things are not that bad for perl 5.12 and above.

-- MichaelDaum - 23 May 2012

I guess my perspective is coloured by having to help people that are on RHEL5 and centos5 - If we only worry about the latest released distro, then yes, life gets easier every day.

mind you, I don't really care that much about unicode stick out tongue I rather have a modern perl for the other bug fixes. Especially as foswiki unicode isn't done (yes, I know chicken and egg)

-- SvenDowideit - 23 May 2012

Sven, I'll kill you if you'll don't care bout utf8. wink And if we talking about the Modern perl, fw team has a a great reluctance against the "Chapter 7". smile pdf here.

-- JozefMojzis - 23 May 2012

Sven, you aren't serious, are you?

While I am unsure whether any perl project actually can and does support unicode properly, Foswiki must support it as good as sensible given the efforts required. So most probably we will reach an okay level of support with more cases being fixed over time.

It starts by requiring a certain version of perl that allows us to do so. 5.8.8 isn't enough for that, even 5.10 isn't. We will have a different distro landscape at the end of this year allowing us to raise the bar to 5.12, says my crystal ball.

And for that we should be prepared in this development cycle, not wait yet longer.

That's our trajectory and there's not much of a way around it.

-- MichaelDaum - 23 May 2012

Interesting perspective

Perl -v on both my production servers say

5.8.8!

It is easy to upgrade a Foswiki. It is painful to get the OS updated in large companies. The field statistics about distributions relate mainly to desktop/laptop installations. Servers are much more behind and most companies favor stability vs having the latest and greatest.

We hardly support UTF8 today. I would be OK with raising the bar when running Foswiki in UTF8 but we should make sure we can still run an ISO-8859 Foswiki on e.g. RHEL 5 which is exactly the distro I am on at work. And also on my home server. So I am actively running and testing on 5.8.8.

We also need to be realistic. We cannot support 5.8.8 for many years. I believe I would need to replace my hardware in a year or so. Then the server reaches the age where HW age is a risk. And when I buy new hardware I upgrade OS as well. I would guess this is what admins normally do. So we may be able to set a 2013 date for raising the bar from 5.8 to 5.10 at least - also for ISO-8859 mode.

-- KennethLavrsen - 23 May 2012

I'm afraid to say that our servers also run RHEL 5, although we do compile our own Perl, so are currently on 5.12.3.

Is it possible to make the unicode stuff pluggable, so if we are on > 5.12 we load good unicode support, otherwise load whatever we currently have?

-- AndrewJones - 23 May 2012

If we want full unicode support in 2.0 it will take at least a year to get a stable release. That may solve it. Ie. one bar for trunk and keep the 5.8.8 bar for current release branch.

I am afraid it will be very difficult to make code that can do both unless all unicode stuff can be isolated in their own .pm files.

-- KennethLavrsen - 23 May 2012

Perl 5.8.8 is over 6 years old. (Actually it's 10 years old if we count from 5.8.0 ... the rest is just maintenance). Foswiki 1.1.x is only a few months old. Any organization set on waiting well over 6 years to upgrade perl can probably wait a few years to upgrade to Foswiki 1.2. I just don't see why this should cripple plans for 1.2 and hopefully a UTF8 core. Any organization that concerned about stability should not be installing Foswiki 1.2 in the first place.

I suggest that we branch to 1.2, and "use" the "then prevalent" perl, be it 5.12 or more likely 5.14. Organizations on 6 year old perl can continue to run foswiki 1.1. Maybe we can offer to provide "urgent" fix support for 1.1.5, so that they are not completely orphaned.

When Foswiki 1.1.x is 6 years old, hopefully the most reluctant / conservative organizations will have a modern perl for use.

And based on all this discussion, I think we need to change the summary to "Move to a modern perl" and flip back to under investigation. Or, move to 5.8.8 for 1.1.6 and modern perl (tbd) for 1.2.

-- GeorgeClark - 23 May 2012

George is right.

Let me emphasize that there are organisations today that desperately need proper unicode in Foswiki implementing an international collaboration space.

This is a real need and we better don't make them wait until the last on the globe updated their perl.

-- MichaelDaum - 23 May 2012

Jozef, Sven is just expressing a reality: the official minimum perl for Foswiki isn't the thing preventing unicode progress, it's stalling because we still lack contributors (apart from Crawford, of course!) or supporters who have an active interest in seeing it happen.

So I think raising the minimum perl requirement for the sake of unicode support is premature, until we actually have a unicode core. Certainly in my own experience with MongoDB, I struggled for days (weeks?) to solve a unicode corruption bug which only went away by moving from 5.10 to 5.12. It could have been a CPAN module, who knows; and maybe even 5.12 isn't enough, according to tchrist's best practices.

As for Moose, I've been using it at $work quite a bit this year - and I'm sure there must be others in the FW team who think it's nifty - but we have such a large, old codebase; we'd need massively good reasons to add Moose dependencies (now THAT would raise the minimum perl requirement), not to mention refactoring core to use it from the ground up... we have much lower hanging fruit which would make our users happy, IMHO (dataforms, store2, search, ui, unicode, etc.)

-- PaulHarvey - 23 May 2012

I think maybe to soften my statement a bit, we really don't have a fully committed time-line yet for 1.2, or 2.0. Earlier this year we were talking about preparing for the 1.2 branch once 1.1.5 was out. But work has recently slowed a bit due mainly to real life interfering.

Kenneth's comment about being ready for a perl bump in 2013 might still be in line with a 1.2 or 2.0 release, depending upon whether or not development picks up. If UTF8 is going to become a serious effort, then using the best perl for UTF8 is probably a strong consideration.

  • Release 1.1.6 is probably safe to require 5.8.8. I suspect we are mostly there already.
  • Release 1.2 When do we branch from trunk? Is that weeks, or months, away?
  • Release 2.0 ???

The ReleasePlan doesn't have a target date for 1.2, and shows 2.0 as "Target 2012". We are approaching half way to that target.

I suspect it is realistic to expect that we won't have 1.2 ready for at least another 6 months. in which case we are approaching 2013. And the conservative organization would seldom install a "dot-0" release, so corporate adoption of 1.2 probably won't happen until 1.2.1 or 1.2.2, which puts us well into 2013.

If we delay UTF8 for 2.0 (which is where it sits on the release plan) we are probably looking at at least 2014.

So I'd prefer that we get 5.12 or 5.14 sooner than later.

-- GeorgeClark - 24 May 2012

1.2.0 (ie trunk) is now 5.8.8 minimum. ie, this feature request has been implemented bar docco.

I would not change the minimum perl version of 1.1 unless there is a specific bug that will be fixed only by doing so - and so far, I haven't seen one.

I'll re-iterate what Paul said. Foswiki does not currently have someone commited to working on fixing unicode properly. If this changes, then that person/group can (easily) mount arguments to increase the minimum version of perl to make their work possible.

Without active contributors commited to delivering unicode, other (more complex) arguments to increase the minimum perl version are needed. I can't see them here (sadly).

George - we are not delaying UTF8/unicode, its not happening because no-one's commited to delivering it and actively working on it. Crawford (i think) made it relatively clear that while he's interested in supporting work on it (and has begun in a github branch) he's not committed to delivering it himself (and especially not by himself).

in short - Michael, Jozef and everyone that feels unicode is essential - Its clearly not essential, otherwise you'd be commited to delivering it, and working on it now.

(please, if you have arguments to move further than 5.8.8, please make a new feature request - this one has already been implemented)

-- SvenDowideit - 24 May 2012

Another thought; I'd dearly love to see a poll/survey of Foswiki users regarding the perl version that they use.

We might convince ourselves that 5.12 is a sane minimum, but it would be nice to have some data about our userbase.

-- PaulHarvey - 24 May 2012

Sven, fair point. Needs a new request. As far as timing, I just figure there is a catch 22 - that nobody can really do anything productive until the perl level supports it. The barrier to entry. ... "that person/group can (easily) mount arguments to increase the minimum version of perl" in my opinion would be a rather intimidating barrier to a new developer. I'd rather have a core ready to welcome their contributions.

Anyway, I don't see much happening in the meantime.

-- GeorgeClark - 24 May 2012

For now this feature proposal is over. Foswiki is now requiring at least perl 5.8.8. But is this an eyebrow raiser? Erm, nope.

From the viewpoint of a perl developer, Foswiki and its history it is clear that implementing proper unicode support (or even half way that direction at least) is a major effort.

From the viewpoint of everybody else out there in the world, utf8 is default thinking. So for them reading this thread must be somewhat a strange experience.

Unicode is a fact. It is a major requirement for any system today, even more if you think of Foswiki as a data wiki that integrates with other systems in the field: databases, search engines, web services, you name it.

Not striving for unicode is putting the fingers in the ears singing lalala.

It is an effort that no single person alone can shoulder. Enough pointing fingers at each other saying "are you committed to deliver". That's not leading us anywhere.

With regards to "there is some github branch somewhere that already implements some feature": this is a considerable misconception of git. While people on github fork easily, they merge rarely ... which defaces the whole purpose of open source collaboration. If these Foswiki branches (unicode, psgi, ...) aren't merged, they are doomed to rotten into non existence.

The only thing this feature request was good for is that we don't have to test Foswiki on earlier perls anymore. Not that we'd be able to do so.

-- MichaelDaum - 24 May 2012

Michael, surely you are not blaming the documented minimum perl version as the reason why nobody cared about Crawford's unicode branch?

I'm also very disappointed that you would blame a tool (git) rather than stupid developer behaviour, when it comes to poor discipline wrt lack of merging.

What I'm taking from your comment is that you favour a less structured, more traumatic but perhaps more energizing approach which would see much svn trunk breakage. And because svn trunk commits need proposals behind them, that's why you associate official minimum perl versions with blocking unicode progress.

Branching and merging code is not rocket science. Modern (agile-tastic?) development processes (and developers) should be able to cope with these concepts, and wield them effectively. Crawford's git branch accomplished and educated a lot without getting in anybody's way; the same can't be said of svn trunk - it needs a strong leader, and serious commitment from us all. Until that is achieved, we'll have to suffer seeing the progress happen on git branches.

-- PaulHarvey - 24 May 2012

Just to clarify my view. My view is to keep CURRENT release branch 1.1.X and raise the bar on the next release that will support Unicode/UTF8. If this is called 1.2 or 2.0 is not part of my point.

It will take at least one full calendar year to get a stable Unicode release so earliest would be mid 2013.

I recommend not keeping 3 parallel development branches.

Either we put Unicode into 1.2 and keep 1.1 release branch for security releases and urgent issues for another year.

The second approach is to plan 2.0 as the Unicode release, and a 1.2 branch for a killer feature that may be ready before.

I have not seen any desire from developers to push out a major feature this year. If I am right about that we should drop plans for a 1.2 and go for 2.0 and just keep 1.1 running until 2.0 is stable.

That would mean perl requirement should be 5.8.8 for Release branch

And at least 5.12 for trunk

-- KennethLavrsen - 24 May 2012

Kenneth - there is a strong desire by me to have 1.2.0 out this year. thats why we decided to make a 1.2, and not wait the year to do 2.0.

what I don't understand is why some people think they can't work on unicode now, as minimum does not mean someone working on unicode can't use 5.14 features and force the issue.

until someone does work on unicode, we dont' need to raise the minimum.

-- SvenDowideit - 24 May 2012

@Paul: I am not blaming git as a tool. It is the lack of merges that concerns me. Merging isn't harder because of technical reasons but due to social reasons: you need to coordinate with other people, not only their code. Therefore, merges are so rare, even for the gems. The longer the merge doesn't happen, it becomes increasingly different technically too.

That's where a centralized software management has its major advantages. Of course you can do centralized with git at its core. So again: it's not the tool.

Yes, I want svn/trunk to break from time to time, and yes I want others to feel the pain. Only then will more people care.

As long as Foswiki hasn't quit svn, it is svn where the actions should happen, not somewhere else. Keeping all those github changes on the radar isn't feasible, so for the rest of us, any development on a privately driven git branch simply doesn't exist. Note that there are even more foswiki branches on github and I don't see any related checkins on svn/trunk.

Companies building custom products on top of foswiki have fulfilled their duty releasing the source code to github or whatever corner of the universe they chose. However it would be eons better to really contribute back by merging it to the main trunk.

@Kenneth: sounds good/realistic.

-- MichaelDaum - 24 May 2012

IMO, Foswiki should freeze 1.1.5 as the last stable non unicode release. Sure, need support it for security bugfixes, but not extending it with new features (on the bad core . (Bad: mean non utf8). So not implementing Store2 and Mongo and anything to 1.1.5 release branch.

From today, we should start working on 1.2 what will have full utf8 support. We need do this anyway at some point - why not today?

We hardly support UTF8 today. I would be OK with raising the bar when running Foswiki in UTF8 but we should make sure we can still run an ISO-8859 Foswiki on e.g. RHEL 5 which is exactly the distro I am on at work. And also on my home server. So I am actively running and testing on 5.8.8.

The UTF8 correct Foswiki should works as next:
  • internally will use utf8, not octets
  • SiteCharset tell foswiki in what codepage are stored all topics on the filesystem
  • if topics are stored in utf8, (so SiteCharset is UTF8) = no conversions applied
  • if the the SiteCharset is not utf8, than at any read from external source (read: file, db-handle, etc) Foswiki should convert from SiteCharset into internal UTF8 and when writing, so from internal UTF-8 to SiteCharset. Is is easy doable with e.g. = use open ':encoding(iso-8859-7)';=.
  • The same should be applied to database handles.
  • With this way, if someone (KennethLavrsen) want run ISO-8859-1 site, he can, because in the external files changes nothing. The only thing what is changing the internal representation bytes/octets/characters. It has nothing with file-encodings.

So guys, i don't understand why you hesitating?

*For the current data-files in the current installations is changing nothing when Foswiki will run in UTF8 core!!!* , only foswiki internally will works more correctly.

-- JozefMojzis - 24 May 2012

For the start, we need:
  • add to the top of all sources ==use Foswiki::Broilerplate;"
  • change all open statements to open(my $fh, "<:encoding($SiteCharset)"), "filename") and resp. for write handles
  • change regexes to correct utf8 form
  • make an correct and easily modify-alble Foswiki::Broilerplate (with use strict, warnings, utf8, Croak, etc)...
  • and start debugging errors

Guys - i understand than when someone reading stackoverflow replies from "tchrist" anyone can get scared how hard is utf8. But it is not. Here is few basic rules, need change few (not soo much) things and everything will works. (especially for US/ISO-1).

Currently, the most of problems what FW has, is based on double-encodings because of internal octets. All those errors dismissing when FW will use utf8 internally.

-- JozefMojzis - 24 May 2012

The problem with writing code in Perl that uses perl syntax and subs that belongs to 5.12 and not 5.8 is that even if these parts of the code are not used for iso charset, the 5.8.8 perl interpreter barfs before it ever executes anything. We have already seen some cases of perl syntax for 5.10 that made 5.8 fail. At the moment developers must write 5.8.8 compatible code. But if you start writing features that 5.8.8 does not support then you have code that is 5.8.8 compatible and some that is not. And unless you can isolate the code that 5.8.8 does not support in few files where we can exclude them fron loading, it will be a nightmare to develop the code.

Sven. I may be a little out of touch of things, but do we have a 1.2 branch created now and is it the Mungo stuff you plan? If you think of new storage or similar then I do not believe it is realistic to get that stable in 2012 either. We have never been able to introduce major features and changs of this nature in less than a calendar year. In fact even 1 year is challenging. We are in end May 12. Soon comes summer where all of you north of Equator will slow down development. We take 6 months to get a patch release out.

So if we stay with a 1.2 that branch should be created ASAP

So even a 1.2 is 2013 for a stable release. And then I see no problem with raising the Perl bar.

-- KennethLavrsen - 24 May 2012

I'd expect converting from non-utf8 to utf8 used internally back and forth to become a major performance problem, i.e. for %SEARCH. Therefore I'd strongly recommend to use CharsetConverterContrib once and have it all utf8ed. Woops, Crawford did not release it yet, but it is there in svn/trunk for quite some time now.

-- MichaelDaum - 24 May 2012

Michael, the lack of merges is because it is not ready to merge and nobody helped fix the code so that it could be merged.

As you note this is not a git problem but as far as I can tell, a lack of interest.

To suggest that the perl 5.8.8 minimum blocks this work just seems so bizarre.

Presently FoswikiSuite barfs as it has a huge memory leak I spent several hours trying to nail down and got nowhere.

You may recall I spent many weekends making dozens of commits on our UnitTestContrib framework to shake out these memory leaks, and I merged all that work across MongoDB, unicode, trunk and Release01x01 branches. git made that possible.

I have removed my hostile comments from here. Apologies to MichaelDaum; I shouldn't make 2am tirades after a bad day. This reaction was from my perception that Michael has not considered the github unicode branch to be a valid place for us to be staging the unicode work.

I still maintain that there is little sense in destabilizing our main svn trunk until the problems discovered in the unicode branch have been addressed. When that's done and unicode branch is merged back into svn trunk, we can abandon the git branch.

If this can only be "fixed" by stomping on svn trunk, then so be it. I will do what I can, but somebody needs to commit (hah!) to it.

-- PaulHarvey - 24 May 2012

My view is that we should freeze trunk soon; get a 1.2 branch sooner rather than later.

-- PaulHarvey - 24 May 2012

I have a number of features that i'm working on for 1.2 - so I am not ready for branching 1.2 - but if the release manager thinks they can make a 1.2 now, without wanting more features, I won't complain - I have heaps of stuff in 1.2 already. (heck, i'd love us to have a 6month release schedule rather than waiting for stuff to be done.)

Jozef - the I don't think that those that have done work on the unicode branch agree with you - they've reported that its more complicated than that.

if it is done, and can be shown to be sufficient, then as i keep saying, it would be trivial to have a real reason to increase the minimum version. but until that code lands in trunk, and is considered good, its not ready.

-- SvenDowideit - 24 May 2012

I just re-read my hostile comments about the attitude that I perceived Michael to have towards the git unicode branch. I've removed that part of my comment which is still here in UseUTF8PerlRequirements?rev=1, my apologies.

Jozef - the points you have highlighted were already made on UnicodeSupport, and as you know Crawford already addressed many of them.

The biggest problem with the unicode branch is the catastrophic memory leaks running FoswikiSuite. OOM killer prevents UnitTestContrib from running to completion.

There is little sense raising the minimum perl until we have something to show for it, in my opinion.

-- PaulHarvey - 25 May 2012 - 04:23

There's no "freeze trunk". We only freeze release branches, don't we?

Quick reminder:

  1. Trunk is bleeding edge development, in terms of breakage expected from time to time.
  2. Release branches are meant to transition from one to the next stable state, ideally.
  3. Any additional (git) branch should be used more for experimental private purposes only.
  4. Maintstream development, that the rest of the project should participate on, happens on trunk, even more when the feature will definitely end up in trunk anyway.

To bring back the project on course we need to:

  • merge all changes in trunk we want in 1.1.6 over to it
  • merge the unicode branch to trunk. that's a 1.2 feature.
  • decide when to merge the psgi branch to trunk (see point 4 above). this might not be a 1.2 feature.

If the psgi merge is stable enuf, we might end up skipping a 1.2 and head for 2.0 just for the fact of the amount of changes in foswiki.

-- MichaelDaum - 25 May 2012

I'd expect converting from non-utf8 to utf8 used internally back and forth to become a major performance problem, i.e. for %SEARCH. Therefore I'd strongly recommend to use CharsetConverterContrib once and have it all utf8ed.

Michael, youre right. The exact number on my notebook following:
  • idecode = reading ISO file as octets and explicit decode to utf8 after the read
  • iopen = opening an ISO file with ="<:encoding(iso-8859-2)"
  • octets = the current
  • uopen = opening a file with "<:utf8" and reading utf8 file
  • uoascii = as above (<:utf8), but reading ascii file
Benchmark: timing 10000 iterations of idecode, iopen, octets, uoascii, uopen...
   idecode: 16 wallclock secs (15.27 usr +  0.65 sys = 15.92 CPU) @ 628.14/s (n=10000)
     iopen: 25 wallclock secs (23.94 usr +  0.94 sys = 24.88 CPU) @ 401.93/s (n=10000)
    octets:  2 wallclock secs ( 0.60 usr +  0.56 sys =  1.16 CPU) @ 8620.69/s (n=10000)
   uoascii:  2 wallclock secs ( 1.91 usr +  0.56 sys =  2.47 CPU) @ 4048.58/s (n=10000)
     uopen:  8 wallclock secs ( 7.03 usr +  0.85 sys =  7.88 CPU) @ 1269.04/s (n=10000)
          Rate   iopen idecode   uopen uoascii  octets
iopen    402/s      --    -36%    -68%    -90%    -95%
idecode  628/s     56%      --    -51%    -84%    -93%
uopen   1269/s    216%    102%      --    -69%    -85%
uoascii 4049/s    907%    545%    219%      --    -53%
octets  8621/s   2045%   1272%    579%    113%      --

As you can see, any conversion has a *huge* performace impact.

But, here is no choice. According too perl unicode faq, encoding is a must. We cannot rely on octets anymore. But, implicit conversion with "<:encoding($Foswiki::cfg{Site}{CharSet})" is the slowest.

So, here are two solutions:
  1. when FW will be utf8 ready, the users get a conversion too how to upgrade their installations into utf8 files (the uopen line in the above test)
  2. if we want allow ISO charsets in the files, need use explicit decode/encode, so not at open level

What is interesting, english people are less penalized. wink The slowdown when opening a file with <:utf8 and reading plain ascii file is much smaller. (partially because the ascii test file has only 102893 bytes, while the utf8 174893 bytes).

-- JozefMojzis - 25 May 2012

attached a file if someone want run the test

-- JozefMojzis - 25 May 2012

Just saw in another task .. Redhat just released a new Enterprise Linux 5.8 on Feb 21, 2012. It ships with Perl 5.8.8. (and kernel 2.6.18 ... from 2006. ).

-- GeorgeClark - 05 Jun 2012

yes, it is according to: http://distrowatch.com/table.php?distribution=redhat - the serie 5 has 5.8.8 and 5.10 coming into RHEL6 (currently development release 6.3).

BTW, redhat will support RHEL5 many years, and in every 5.* will be perl 5.8.8.

5.10 comes only into RHEL6 - and - 5.10 is "not enough" anyway...

/therefore i love freebsd - it's simple works ;)/

-- JozefMojzis - 06 Jun 2012

I merged unicode branch with latest trunk, see Tasks.Item5437

-- PaulHarvey - 16 Jun 2012

Seems to be working OK with the merged utf8 branch (full unicode) so closing this.

-- CrawfordCurrie - 19 May 2015

 
Topic revision: r11 - 19 May 2015, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy