Item12022: Log rotation is too slow, causes timeouts on busy sites at the end of month
Priority: Urgent
Current State: Closed
Released In: 1.1.6
Target Release: patch
Applies To: Engine
Component: FoswikiLogger
Branches: Release01x01 trunk
For the last couple of months (we run trunk - presently Rev 15103 not found), our site has crashed at exactly midnight GMT at the end of the month.
I get 5 perl processes (our FCGID max_procs is 5) all maxing out CPU, I assume they're all trying to rotate the same logfile at the same time.
Our site is moderately busy, our events.log for any given month for the last 6 months have grown from 50MiB up to 120MiB.
Users get 500 timeouts and restarting apache doesn't help. None of the view requests are able to be completed inside of our 30 second fcgid timeout.
My work-around has been to stop apache, run
./view
from the CLI - it takes a bit over a minute (?!) to complete - then start apache, and the site is back to normal for another month.
--
PaulHarvey - 01 Aug 2012
Discussion:
http://irclogs.foswiki.org/bin/irclogger_log/foswiki?date=2012-08-01,Wed&sel=116#l112
--
PaulHarvey - 01 Aug 2012
That's exactly what I've had to do on Foswiki.org. Stop apache, ./view, and then start apache, though for us, the impact is usually things keep going after the rotate attempts crash. Also error.log and debug.log are still not being rotated on Foswiki.org, however events.log rotated just fine.
--
GeorgeClark - 01 Aug 2012
Reading the tealeaves from NYTProf, of the
1.83s (450ms+1.38) within Foswiki::Logger::PlainFile::_rotate which was called 30028 times, avg 61µs/call:
my $eventTime = Foswiki::Time::parseTime( $event[1] );
# spent 1.21s making 30012 calls to Foswiki::Time::parseTime, avg 40µs/call
And on this run,
timeit
reports that the rotate call took 1.28s to complete. That means 95% of the CPU time spent rotating a logfile is in parsing the date strings.
--
PaulHarvey - 01 Aug 2012
I replaced that line with a dumb regex which should only match isodates; if it doesn't match, it uses the old parseTime method for grabbing the year-month of the log entry.
This seems to result in a 4x speedup of log rotations.
On my (real!) Xeon 3.0GHz PC:
# |
wallclock secs |
Lines/sec |
Before |
0.440275 |
34069 |
After |
0.117118 |
128075 |
I'd like to see a bigger improvement but this should address my immediate concerns. Closing.
--
PaulHarvey - 01 Aug 2012
The code is very stupid; it will start trying to rotate the log
even if someone else is already rotating it.
There are a number of really obvious ways to improve on that!
--
CrawfordCurrie - 01 Aug 2012
Improved the bug title for release notes.
--
PaulHarvey - 02 Aug 2012
I've added this to the
KnownIssuesOfFoswiki01x01, which includes a hotfix suggestion.
--
PaulHarvey - 28 Sep 2012
Another solution is to make it easier to re-enable the Compatibility logger. I'm updating configure to define the file settings used by the compatibility logger, but defaulting them to empty. And modifying the checkers to note the issues with the
PlainFile logger.
--
GeorgeClark - 14 Oct 2012