Item8706: Foswiki cache crashes engine

pencil
Priority: Urgent
Current State: Closed
Released In: 1.1.0
Target Release: minor
Applies To: Engine
Component:
Branches:
Reported By: KennethLavrsen
Waiting For:
Last Change By: CrawfordCurrie
I have tried a few times to enable the cache and Foswiki just crashes.

I thought it was because the task that introduces it was not complete so I did not want to raise a task.

But it seems Michael thinks the feature is running so now I open tasks.

So far only this.

Can't locate Cache/FileCache.pm in @INC (@INC contains: /var/www/foswiki/core/lib . /usr/lib/perl5/site_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.8/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.8/i386-linux-thread-multi /usr/lib/perl5/5.8.8 /var/www/foswiki/core/lib/CPAN/lib//arch /var/www/foswiki/core/lib/CPAN/lib//5.8.8/i386-linux-thread-multi /var/www/foswiki/core/lib/CPAN/lib//5.8.8 /var/www/foswiki/core/lib/CPAN/lib/) at /var/www/foswiki/core/lib/Foswiki/Cache/FileCache.pm line 35.
 at /var/www/foswiki/core/lib/Foswiki/Cache/FileCache.pm line 35
   Foswiki::Cache::FileCache::BEGIN() called at Cache/FileCache.pm line 35
   eval {...} called at Cache/FileCache.pm line 35
   require Foswiki/Cache/FileCache.pm called at (eval 16) line 2
   Foswiki::PageCache::BEGIN() called at Cache/FileCache.pm line 35
   eval {...} called at Cache/FileCache.pm line 35
   eval 'use Foswiki::Cache::FileCache

-- KennethLavrsen - 14 Mar 2010

Follow up

The engine crashes because there is a secret non-documented dependency on CPAN libraries not installed.

The Foswiki cache requires in most of the methods some additional CPAN libs. But since there is literally no documentation other than a few 1-2 liners in configure, noone has a living chance to know that for example the FileCache method needs a CPAN lib called Cache-Cache.

There are several things needed to close this bug.

First there has to be a FoswikiCache document in System web that describes the different methods. This document must describe

  • What the advantages of the different methods are. People need to have some criteria to select them from
  • Exact information on which CPAN libraries that the admin must install to get the different methods working.
  • A better default cache mode must be selected. And for this we must distribute the needed CPAN libs in our local Foswiki CPAN repository. At least one method must work out of the box. And it must be a method that also works in normal CGI mode on a shared host.

The Foswiki Cache is a significant blockbuster feature in 1.1 that can mean significant performance improvement for applications that do SEARCH in webs with many topics. Ie. any serious wiki application where you use Foswiki topics as data base records. I have seen 12 seconds responses reduced to 4 seconds after the cache is seeded so it is a cool feature.

Now we just need it available to the masses by doing proper documentation and a good default mode with the needed CPAN lib provided so anyone can get started using it.

I also think it would make sense to make the code more robust so the failure mode you get tells the admin that the problem is missing CPAN lib and not a software bug in Foswiki

-- KennethLavrsen - 15 Mar 2010

This has been stagnating for a while without attention from Michael, so setting it explicitly for his attention. I concur that the documentation is inadequate, and for that reason am confirming the issue.

-- CrawfordCurrie - 09 Apr 2010

I just tried again, and after resolving the undocumented dependency on Digest::SHA1, every page view now results in junk (looks like binary) output. Babar is trying to understand it.

I started PageCachingDraft to capture the doc Kenneth asked for above.

-- CrawfordCurrie - 14 May 2010

So, to sum up my change, there are 3 parameters which changes the output:
  • cfg{HttpCompress}: Tells foswiki that it may compress its output (default: FALSE)
  • cfg{Cache}{Compress}: Tells the cache that it should store its output in a compressed form (default: TRUE)
  • Browser's Accept-Encoding: Tells the webserver if the browser is capable of handling gzip compressed data

By default, {HttpCompress} was false, but {Cache}{Compress} was true, so the core was compressing the page in the cache, but sending it raw (meaning: sending the gzip'd version without setting the Content-Encoding header.

What I did is try to cover all cases:
  • If {HttpCompress} is on, and browser supports it, send the raw cache version if {Cache}{Compress} is on, otherwise compress it on the fly
  • If {HttpCompress} is off, there is a cache version, and {Cache}{Compress} is on, uncompress it on the fly

I've also tuned the configure parameters to have {Cache}{Compress} and make a note that it should follow {HttpCompress}, unless user have a very good reason not to.

Will try to write some Configure checkers

-- OlivierRaginel - 14 May 2010

Ok, wrote some checkers, feel free to add more smile Also some unit tests would be great, but...

-- OlivierRaginel - 14 May 2010

I am not sure this is a good change. You reintroduced the {Cache}{Compress} parameter which has been replaced by {HttpCompress} completely. That's because compressing the content in the cache only is of no value if there's no http-compression also. There is however value in http-compression even without page caching when your cpu is rather capable compared to its connection bandwidth and latency.

-- MichaelDaum - 14 May 2010

Completely? Are you sure about that? The root cause of this bug was that it wasn't completely replaced, and as it's very well documented, I guessed the use case:
  • Allow gzip'ed content to be sent to the browser, so do mod_deflate's job => {HttpCompress}
  • Save cached data in compressed form => {Cache}{Compress}
And I've added a warning if one is enabled and not the other.

I disagree with your statement that there is no value to compress the cache if there is no http-compression. For me, there is no value in http-compression, as this should be done by Apache, or even by some dedicated hardware on some really heavily used websites (not dedicated to the wiki, but often companies have SSL proxies or stuff like that which would do it in a much more efficient way). So you don't need to send it compressed. Or for this use-case, if you use cache, you don't want it to DoS you disk space, but maybe you don't want to use caching for http (maybe your clients are not very well configured and most of them don't use caching).

I agree it's a border case, which is why I put the warning in configure if you enable one without the other.

As per the other issue with this bug, the one CDot has been working on, the documentation is really sparse, and we couldn't figure out the advantages / disadvantages of each of your cache backends. So for me, all what's left to fix it is to expand and verify PageCachingDraft.

-- OlivierRaginel - 14 May 2010

The main use case for letting foswiki send gzip'ed content is when it is already stored gziped in its cache. It is gziped ones and delivered in that encoding as often as possible. mod_deflate is not recommended as it does the same thing again and again: look, mummy, I just compressed the same content again saving the same amount of bandwidth I did the last time, and I just burned the same amount of cpu resources again...

Well, foswiki's page cache does better. The cache is there to never do the same thing twice if possible. It compresses the response, stores it, get its out again as is and sends it over. The only situation when it has to uncompress a cached and pre-compressed response is for IE6 which does not understand gziped html.

If {Cache}{Compressed} wasn't eliminated completely, then this is an error.

Compressing the http response doesn't come for free: it costs cpu time. If your sever is not able to spent this extra effort given the overall requests it has to server, then better don't spent any cycle on compressing at all. Note, that browsers will have to spent time to uncompress the response too. This is a measurable effort on the client side, i.e. if the uncompressed response is large.

Nother thing: bette don't waste cpu compressing it for the cache and then spent even more cpu to uncompress it before sending it over uncompressed on each request. That's a waste of resources. The amount of storage needed for the cache is totally neglectable given todays disk sizes.

So let's please, revert this change. Unfortunately I am still travelling this week, only able to connect to the internet sporadically.

-- MichaelDaum - 17 May 2010

Reverting the change will just break it again. Eliminating {Cache}{Compress} and using {HttpCompress} everywhere appears to be what you are suggesting, and it makes sense to me.

Michael, we really need you to document what is going on, both from a code perspective (accurate and complete POD documents) and from a user perspective (PageCachingDraft). We're trying hard to help, but to a great extent we are stabbing in the dark.

Also, I'm concerned that there has not been adequate testing of the various options; for example, the MemoryHash module didn't even compile before I started looking at the code, which makes me think it may not have been tested in real life. There are a bunch of scenarios that need to be tested:
  • FileCache
  • BDB
  • MemoryCache
  • MemoryHash (should this even be in the repository?)
  • HttpCompress with each of the above
  • Use with and without accelerator (e.g. mod_perl) with each of the above
  • Use of compression with the main browsers IE6 IE7 IE8 FF3.* Safari Opera Chrome
I'm not necessarily suggesting that there should be unit tests for all of these configuration, but we should have tested - and reported the results - for each of them.

-- CrawfordCurrie - 18 May 2010

I'll be able to merge the latest changes with my own changes and revert the cache compression confusion next week. I hope that someone can help me out reverting the extra code in configure dealing with cach-compress-vs-http-compress.

-- MichaelDaum - 18 May 2010

I don't understand why we would need to do that. For the moment, we have both, and they work. We have a warning that having them set to something different is silly. I don't understand what we will gain by removing one useless configuration option. But I can do it if needed. Anyway, I really agree with CDot that the current issue is more in documentations than anything else.

-- OlivierRaginel - 18 May 2010

Okay here's a small table to compare all combinations of settings for {Cache}{Compress} and {HttpCompress}:

{Cache}{Compress} = enabled {Cache}{Compress} = disabled
{HttpCompress} = enabled (a) compressed pages are delivered from cache (b) compression happens for each request
{HttpCompress} = disabled (c) uncompression happens for each request (d) don't bother

From the four combinations only (a) and (d) are a sensible choice. (b) is bad as pages are delivered compressed but are explicitly not stored in compressed format, so cpu resources are burned doing the same thing again and again. (c) is bad for the same reason. The page is stored in compressed format; it is uncompressed again and again even though the browser would be fine with a gziped http encoding.

I don't consider disk space an argument at all. If we can trade disk space for speed, then by all means we need to strive for that, even more as we are not short on disk space at all.

So from four combinations only two make sense, that is you can spare one setting to distinguish these configurations.

-- MichaelDaum - 24 May 2010

Sorry Michael, but I already said so. Yes, I agree there is probably 0 use-case for having them split. My point is that it is now coded like that, it works, and there are warnings if they're not set to the same value. So please let's focus on documentation, and I'll merge the settings one day if I have time and nobody beats me to it.

-- OlivierRaginel - 26 May 2010

No worries, I've merged the changes already. No actions required in that respect for you.:)

-- MichaelDaum - 26 May 2010

 
Topic revision: r32 - 06 Sep 2010, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy