Item10149: versions query w/SEARCH, RCS store = hang

pencil
Priority: Enhancement
Current State: Confirmed
Released In: 2.2.0
Target Release: minor
Applies To: Engine
Component: SEARCH
Branches: trunk
Reported By: PaulHarvey
Waiting For:
Last Change By: CrawfordCurrie
I've not had much luck with the versions query in SEARCH.

It works fine for individual topics with QUERY, but SEARCH is just too slow unless you only have a few dozen topics.

I think we need a strategy before we ship this. At a minimum, disable versions in SEARCH with RCS stores, or alternatively we introduce a new feature to make SEARCHes abort long-running queries in a graceful manner (can we just return a shorter-than-normal resultset, and log/emit an error?).

The query I ran on trunk.foswiki.org was something like:
%SEARCH{
  "versions[author='PaulHarvey']"
  web="Development"
  limit="50"
}%

Which ran for several minutes before Koen killed the process for me.

-- PaulHarvey - 11 Dec 2010

Yeah, well, TBH I'm not that surprised. A versions query has to load a shitload of information just to query, and that's not efficient. As you say, the RCS store just isn't built for this kind of query.

Should this be a fix just for a versions query, or is there a more general problem, that a user should be able to put a limit on the amount of time spent on a query? A general mechanism would work for other types of bad query.

-- CrawfordCurrie - 11 Apr 2011

I don't have any good ideas on how to control a hypothetical "search timeout" feature. Configure setting? Macro param? URL param? What should be the default?

On my own wiki, I have some reports that just take 10s of seconds, and a dot graph that takes minutes - I need to be able to run those from a cron job, where I save the output back into "cache" topic which is refreshed every hour.

Let's say we have a default SEARCH timeout of 10s - I need a way for my cron-job scenario to override it so that it will run to completion from CLI.

Additionally: I don't know about general Foswiki practice, but the biggest troubles on my wiki are nested searches. If there's 200 SEARCHes as a result of an outer SEARCH, how do we apply a "timeout" in that case, if each individual SEARCH is still on the order of ~2s?

Hmm, so at first glance it seems that a general solution is a can of worms.

If I get time to invest in this, I think I'd rather try to ship a SearchAlgorithm which continues to work with RCS store but perhaps caches just the %META part of every topic version in working somehow (reproducing the data/ directory layout, but Topic.txt,v would be a directory)

-- PaulHarvey - 11 Apr 2011

I have added a caveat emptor to the QuerySearch topic to warn of the performance risks. Having done this I think it is valid to regrade this report from 'Urgent' to 'Enhancement'.

-- CrawfordCurrie - 21 Jan 2013

I think that caching of the RCS log information plus the %META would be good as a basic feature of the RCS based Store. Anything that needs to dip into the rcs log records, for ex. the Attachment display of the comment field, is horribly slow. Avoiding RCS to access the topic metadata history without a full rcs pass would be very helpful. Maybe store it along side the file, and file,v as file,meta.

-- GeorgeClark - 21 Jan 2013

Crawford. Any more to add to Caveat Emptor with regard to the PlainFileStore? Is the very poor performance of the versions search alleviated with the PFS

-- GeorgeClark - 12 Jan 2015

Anecdotally yes, but I haven't benchmarked it. The RCS performance is down to having to reconstruct every previous revision from the ,v. Since the PFS stores the old revisions as plain text, it should be much faster.

-- CrawfordCurrie - 17 Jan 2015

Missed the release meeting where this was discussed, so here's my take on it.

RCS is fundamentally slow for versions, and I personally think it's a waste of time trying to do anything with it. As I said above I haven't benchmarked the PFS for these queries, a first step before any other work would be to do this.

One way to accelerate these queries would be to use a store cache, like the DBIStoreContrib (which uses an SQL DB to cache the store in a structured DB for accelerated queries). Personally I think that's the best approach. The DBIStoreContrib doesn't currently cache old revisions, but it could be made to do so. Either way, I think this should be taken out of the mainstream release plan, and if anyone really wants it, they can extend DBIStoreContrib (or fund that work).

Another type of store might be another valid approach - one that simply stores entire topics and their history in a DB for rapid retrieval, avoiding file system accesses. Could be an interesting project, especially when coupled with MichaelDaum's ideas on minimising topic re-reads.

-- Main.CrawfordCurrie - 21 Mar 2017 - 08:29

 

ItemTemplate edit

Summary versions query w/SEARCH, RCS store = hang
ReportedBy PaulHarvey
Codebase trunk
SVN Range
AppliesTo Engine
Component SEARCH
Priority Enhancement
CurrentState Confirmed
WaitingFor
Checkins distro:348c525471aa
TargetRelease minor
ReleasedIn 2.2.0
CheckinsOnBranches trunk
trunkCheckins distro:348c525471aa
masterCheckins
ItemBranchCheckins
Release02x01Checkins
Release02x00Checkins
Release01x01Checkins
Topic revision: r13 - 21 Mar 2017, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy