You are here: Foswiki>Tasks Web>Item11454 (05 Jul 2015, GeorgeClark)Edit Attach

Item11454: RcsLite performance is bad on topics with long histories.

pencil
Priority: Normal
Current State: Closed
Released In: 2.0.0
Target Release: major
Applies To: Engine
Component: FoswikiStore, Performance
Branches: master
Reported By: PaulHarvey
Waiting For:
Last Change By: GeorgeClark

Background

As of Item11091 (specifically, distro:5a79947c9bfd), Foswiki 1.1.4 is less trusting of the TOPICINFO line in .txt files when the .txt is newer than the .txt,v.

In this situation, Foswiki 1.1.4 transparently adjusts the TOPICINFO line 'on the fly' as follows:
  • info.date is obtained from the filesystem last-modified datestamp of the .txt
  • info.version is obtained by digging into the ,v file and adding 1
  • info.author becomes 'UnknownUser'

The reason for this re-writing is that the .txt file must have been 'mauled' by an external process, and generally these do not correctly populate TOPICINFO with accurate information. This causes problems, see Item11091 for details.

Problems

  • Foswiki performance now suffers significantly, especially if there are many 'mauled' .txt files. For each one, Foswiki must spawn an RCS rlog command, if using RcsWrap. Using RcsLite can mitigate the problem somewhat. See Item11476 for caveats.
  • Item11473 (merged with this task) is a complaint about the new, surprising info.author being set to UnknownUser. This new behaviour may be unacceptable to some installations.

Work-arounds

  • Set {Store}{Implementation} in configure to RcsLite. See Item11476 for caveats.
  • If you are happy to emulate the Foswiki 1.1.3 behaviour (i.e. accept the TOPICINFO line of the mauled .txt files), use the touch command to force the relevant .txt,v file to have a later datestamp.
    • To update the last-modified datestamp of all txt,v files in your installation, use something like:
      find /path/to/foswiki/data -type d -exec bash -c 'cd {} && touch *.txt,v' \;
    • To update the last-modified datestamp of only those txt,v files which aren't in sync with their .txt cache, use something like:
      perl -MFile::Find=find -wle'find(sub{/^(.*.txt),vz/&&-f&&system("echo touch -f $1 $_")},@ARGV)' /path/to/foswiki/data
      (courtesy OlivierRaginel)
  • If your .txt files are mauled by an external script which you are able to change, you may wish to call the touch command as an extra step at the end of your script, or even better: ensure that it leaves .txt files with an accurate TOPICINFO line (increment version number, update its date epoch), then do an RCS checkin to update the .txt,v file properly

(Extraneous comments removed and may be found at revision 10).

-- PaulHarvey - 26 Jan 2012

And switching the storage implementation to RcsLite gives even more speed improvements in the average case.

-- MichaelDaum - 25 Jan 2012

True; but it's not that simple - RcsLite is faster when Foswiki must process many ,v files in a given request, BUT it can have disasterously poor worst-case performance when ,v files get large, Eg. on large attachment files - for two reasons - firstly, RcsLite loads entire ,v files into memory, and secondly, the external RCS binaries are written in C, so their raw throughput is much greater than any PurePerl solution.

We should really work on a hybrid VC store to get the best of both worlds (especially the rlog case to get current version number, which should only require reading the first few lines of a ,v file).

-- PaulHarvey - 25 Jan 2012

I don't see a reason why RcsLite must load all revisions at once, not even when this thing was hybrid. That's a bug.

As most serious foswikis are running in a persistent perl environment (and will even more once foswiki has converted to PSGI), there shall be no more forking of an external rcs helper tool any more at all, even for whatever large histories there are.

Instead, RcsLite needs fixing.

Only when it turns out that fixing RcsLite is impossible not to operate as inefficient in the worst case scenario as it seems to be doing right now, should we think about complicating things even further and make the code hybrid, what ever unknown performance behavior that entails in itself.

For now I can't confirm any performance problems using RcsLite. Much more on the contrary.

A normal foswiki has got - let me guess - approximately 5 revs per topic and 1.5 revs for attachments on average. These normal foswikis will only profit from switching to RcsLite right now. That's a low hanging fruit and a GoodThing™ to do as people don't have to wait for us hackers to come along with even better code.

And therefore RcsLite should be the default.

-- MichaelDaum - 25 Jan 2012

Except I've talked an IRC user or two who had tried RcsLite, and reverted back again because they had one single important file with massive history that would cause fcgid timeout.

WebStatistics is a good example of where wrap is faster than lite.

I agree though, we can fix RcsLite

-- PaulHarvey - 25 Jan 2012

Created Item11476 for RcsLite concerns. This task needs to focus on problems & solutions involving performance when .txt is mauled

-- PaulHarvey - 26 Jan 2012

Made Item11476 as urgent as this one.

-- MichaelDaum - 26 Jan 2012

I have re-written and re-titled this task so we can merge & close Item11473

-- PaulHarvey - 26 Jan 2012

The current behaviour is correct. If an external processes damages .txt, then it is the UnknownUser who performed that edit.

There are adequate solutions to this problem - touching ,v files, making external process check in etc - that I feel this should neither be a 1.1.5 release blocker, nor even a report - except insofar as the performance of RcsLite is poor. So I changed the title from "1.1.4 is slower and shows info.author as 'UnknownUser' when .txt is mauled by an external process" to what it is now, and re-assigned to 1.2.

-- CrawfordCurrie - 09 Mar 2012

RcsLite performance was being addressed in Item11476. Will you close that as duplicate? Or this one?

To say that this doesn't even deserve a report ignores the fact that this has been a support problem. Many users have been impacted by this.

The new behaviour may be correct but the new behaviour is new and we need to educate people about this better. At the very least we need to ship a dedicated System FAQ item.

-- PaulHarvey - 10 Mar 2012

There is a comment / SMELL in RcsLite:
# SMELL: This code uses the log field for the checkin comment. This field is alongside the actual text
# of the revision, and is not recorded in the history. This is a PITA because it means the comment field
# can't be retrieved without reading up to the text change for the version requested - even though foswiki
# doesn't actually use that part of the info record for anything much. We could rework the store API to
# separate the log info, but it would be a lot of work. Using this constant you can ignore the log info in
# getInfo calls. The tests will fail, but the core will run a lot faster.
use constant CAN_IGNORE_COMMENT => 0;    # 1

Should we document this, or maybe even run this way by default. If we can stop reading the rcs file before getting into the body of the diff, it would seem that would be a huge boost.

-- GeorgeClark - 17 Jun 2014

 

ItemTemplate edit

Summary RcsLite performance is bad on topics with long histories.
ReportedBy PaulHarvey
Codebase 1.1.4, 1.1.4 RC2, 1.1.4 RC1, 1.1.4 beta2, 1.1.4 beta1, trunk
SVN Range
AppliesTo Engine
Component FoswikiStore, Performance
Priority Normal
CurrentState Closed
WaitingFor
Checkins distro:6cfbfb76a039
TargetRelease major
ReleasedIn 2.0.0
CheckinsOnBranches master
trunkCheckins
masterCheckins distro:6cfbfb76a039
ItemBranchCheckins
Release01x01Checkins
Topic revision: r17 - 05 Jul 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy