You are here: Foswiki>Tasks Web>Item12022 (02 Dec 2012, GeorgeClark)Edit Attach

Item12022: Log rotation is too slow, causes timeouts on busy sites at the end of month

pencil
Priority: Urgent
Current State: Closed
Released In: 1.1.6
Target Release: patch
Applies To: Engine
Component: FoswikiLogger
Branches: Release01x01 trunk
Reported By: PaulHarvey
Waiting For:
Last Change By: GeorgeClark
For the last couple of months (we run trunk - presently Rev 15103 not found), our site has crashed at exactly midnight GMT at the end of the month.

I get 5 perl processes (our FCGID max_procs is 5) all maxing out CPU, I assume they're all trying to rotate the same logfile at the same time.

Our site is moderately busy, our events.log for any given month for the last 6 months have grown from 50MiB up to 120MiB.

Users get 500 timeouts and restarting apache doesn't help. None of the view requests are able to be completed inside of our 30 second fcgid timeout.

My work-around has been to stop apache, run ./view from the CLI - it takes a bit over a minute (?!) to complete - then start apache, and the site is back to normal for another month.

-- PaulHarvey - 01 Aug 2012

Discussion: http://irclogs.foswiki.org/bin/irclogger_log/foswiki?date=2012-08-01,Wed&sel=116#l112

-- PaulHarvey - 01 Aug 2012

That's exactly what I've had to do on Foswiki.org. Stop apache, ./view, and then start apache, though for us, the impact is usually things keep going after the rotate attempts crash. Also error.log and debug.log are still not being rotated on Foswiki.org, however events.log rotated just fine.

-- GeorgeClark - 01 Aug 2012

Reading the tealeaves from NYTProf, of the 1.83s (450ms+1.38) within Foswiki::Logger::PlainFile::_rotate which was called 30028 times, avg 61µs/call:
my $eventTime = Foswiki::Time::parseTime( $event[1] );
# spent 1.21s making 30012 calls to Foswiki::Time::parseTime, avg 40µs/call

And on this run, timeit reports that the rotate call took 1.28s to complete. That means 95% of the CPU time spent rotating a logfile is in parsing the date strings.

-- PaulHarvey - 01 Aug 2012

I replaced that line with a dumb regex which should only match isodates; if it doesn't match, it uses the old parseTime method for grabbing the year-month of the log entry.

This seems to result in a 4x speedup of log rotations.

On my (real!) Xeon 3.0GHz PC:
# wallclock secs Lines/sec
Before 0.440275 34069
After 0.117118 128075

I'd like to see a bigger improvement but this should address my immediate concerns. Closing.

-- PaulHarvey - 01 Aug 2012

The code is very stupid; it will start trying to rotate the log even if someone else is already rotating it.

There are a number of really obvious ways to improve on that!

-- CrawfordCurrie - 01 Aug 2012

Improved the bug title for release notes.

-- PaulHarvey - 02 Aug 2012

I've added this to the KnownIssuesOfFoswiki01x01, which includes a hotfix suggestion.

-- PaulHarvey - 28 Sep 2012

Another solution is to make it easier to re-enable the Compatibility logger. I'm updating configure to define the file settings used by the compatibility logger, but defaulting them to empty. And modifying the checkers to note the issues with the PlainFile logger.

-- GeorgeClark - 14 Oct 2012
 
Topic revision: r24 - 02 Dec 2012, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy