Item10596: MongoDBPlugin milestone 4
Priority: Normal
Current State: Closed
Released In: n/a
Target Release: n/a
This one's about redoing the schema so that each web is in a seperate 'database', with a 'current' collection, a history collection and other optimisation based collections.
Number of topics: 7
--
SvenDowideit - 05 Apr 2011
So, the urgentest one is sorting
Item9893, and the following tasks are "important but not urgent" (roughly in order):
- Item10611 - order="modified" and order="created" weirdness, but I'm still trying to understand this bug
- Item10409 - this might be an easy win that our users would appreciate, however, if it makes more sense to tackle this at a later stage then that's cool too
- Item10532 -
slave_ok
madness
--
PaulHarvey - 08 Apr 2011
er, and obviously delegating ACLs is probably a m5 thing
--
PaulHarvey - 08 Apr 2011
moved m5 tasks to
Item10652
added magic list to make my life easier
--
SvenDowideit - 19 Apr 2011
Hi Sven, sorry about the delay testing m4. Hrm. Before, our /var/lib/mongodb/set was ~4-5GB.
MongoDB's chunk size is 200MB, so per-web this is the first allocation size - ie. minimum 200MB overhead per web/subweb. So now we've got 23GB. Our standard VM setup is 60GB disk, so I'm having to do some tidying up now.
Initially it looks like we only have ~1GB RES, so I hope this is just disk overhead (not RAM). We'll see how it goes..
I'm concerned that DB-per-web will kill off any wiki app that makes heavy use of subwebs, although I don't use that pattern atm.
But the good news: we can sort Lauries now
Performance is about the same as the m3+ code we were on.
--
PaulHarvey - 20 Apr 2011
Also having trouble getting MongoDB to notice newly created webs - i.e. it seems I have to explicitly load a web before the DB gets created for it
Item10664
--
PaulHarvey - 20 Apr 2011
Also, it doesn't work if I load the new subweb by itself: it seems that SEARCHes can't see the new data unless I re-load the whole root web
--
PaulHarvey - 20 Apr 2011
MigrationScriptsContrib now has a script to load all insect names from AFD
--
PaulHarvey - 20 Apr 2011
I think
--noprealloc --smallfiles
can save the day. It's nothing to do with "chunk size" (I think - that's a sharding thing), I think our massive disk usage is just the prealloc overhead.
--
PaulHarvey - 20 Apr 2011
Here are the results:
Web |
Size |
Config |
System |
209M |
|
Sandbox |
417M |
|
System |
65M |
--noprealloc --smallfiles --directoryperdb |
System + Sandbox |
97M |
--noprealloc --smallfiles --directoryperdb |
Without the extra options, adding the Sandbox web after System cost 208MB.
With the extra options, the cost was 32MB.
So now instead of 20GB for 100 webs, we'll have an overhead of 3.2GB. Much better
(will test on production tonight).
--
PaulHarvey - 20 Apr 2011
yes, I presumed that the disk size was mostly prealloc - mostly a tuning thing, though I have to admit that my test server only has and 80GB disk, so :}
--
SvenDowideit - 21 Apr 2011
Okay, a full re-load with directoryperdb, smallfiles & noprealloc on each mongod sees 6.2GB on replSet members a & b, but 5.4GB on member c. Member c is running Ubuntu 10.04 LTS whereas a & b are Ubuntu 9.10, still, strange there's different overhead there... the mongod PRIMARY seems to stabalise to ~2.8GB RES memory usage
--
PaulHarvey - 22 Apr 2011
This query is now taking ~4s at the mongodb end according to our timing headers (82k topics):
%SEARCH{"form.name='System.MigrationScriptsInsectsDemo.InsectsDemoForm'"
type="query"
web="System.MigrationScriptsInsectsDemo"
pager="on"
pagesize="10"
}%
However,
db.current.find({"FORM" : {"name" : "System.MigrationScriptsInsectsDemo.InsectsDemoForm"}}).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 82888,
"nscannedObjects" : 82888,
"n" : 82872,
"millis" : 165,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Indicates ~0.165ms. What are the headers actually reporting on? Or does the foswiki query issue something more complex?
--
PaulHarvey - 23 Apr 2011
milestone 4 is now cooked, all it needs is testing and bug fixing, which should be handled in separate tasks - closing.
the header question above - can you put the headerinfo into a task?
- the answer is that the MongoDB header entry is a list of time taken for query (in mongo+roundtrip), and
- the DebugLog one is the measured time to run the entire perl code (variation due to where I can put hooks)
and yes, its not fastcgisafe - he says looking for the task
--
SvenDowideit - 04 May 2011