Item9893: mongo throws an exception if you're sorting on a key that is not indexed?
Priority: Urgent
Current State: Closed
Released In: n/a
Target Release: n/a
in an annoying
wtf
, it seems that
MongoDB has issues with adhoc queries on large datasets :/
Could not perform search. Error was: query error: too much data for sort() with no index at /home/sven/src/foswiki/core/lib/Foswiki/Search/MongoDBInfoCache.pm line 56. at /home/sven/src/foswiki/core/lib/Foswiki/Search/MongoDBInfoCache.pm line 56 Foswiki::Search::MongoDBInfoCache::next('Foswiki::Search::MongoDBInfoCache=HASH(0x4bd2778)') called at /home/sven/src/foswiki/core/lib/Foswiki/Search/ResultSet.pm line 93 Foswiki::Search::ResultSet::hasNext('Foswiki::Search::ResultSet=HASH(0x4bd2370)') called at /home/sven/src/foswiki/core/lib/Foswiki/Search.pm line 357 Foswiki::Search::searchWeb('Foswiki::Search=HASH(0x4a20ae8)', 'search', 'ZonePlugin', 'basetopic', 'ZonePlugin', 'reverse', 'on', '_RAW', '"ZonePlugin" limit="10" scope="topic" web="System" nonoise="o...', ...) called at /home/sven/src/foswiki/core/lib/Foswiki/Macros/SEARCH.pm line 32 Foswiki::__ANON__() called at /usr/share/perl5/Error.pm line 416 eval {...} called at /usr/share/perl5/Error.pm line 408 Error::subs::try('CODE(0x24961a0)', 'HASH(0x4a20b00)') called at /home/sven/src/foswiki/core/lib/Foswiki/Macros/SEARCH.pm line 41 Foswiki::SEARCH('Foswiki=HASH(0x2a11550)', 'Foswiki::Attrs=HASH(0x2496278)', 'Foswiki::Meta=HASH(0x35d1688)') called at /home/sven/src/foswiki/core/lib/Foswiki.pm line 3015 Foswiki::_expandMacroOnTopicRendering('Foswiki=HASH(0x2a11550)', 'SEARCH', '"ZonePlugin" limit="10" scope="topic" web="System" nonoise="o...', 'Foswiki::Meta=HASH(0x35d1688)') called at /home/sven/src/foswiki/core/lib/Foswiki.pm line 2905 Foswiki::_processMacros('Foswiki=HASH(0x2a11550)', '
which is going to make life interesting
--
SvenDowideit - 25 Oct 2010
Goodness,
SERVER-790 is curious :/
--
PaulHarvey - 26 Oct 2010
there's more. you can only define 40 indexes, and compound indexes only seem to help
sometimes
a couple of possibilities:
- only add indexes when this issue is noticed (ie, when a mongoDB query returns this error, create an index on the requested sort, and if we run out, either tough, or delete one..)
- use one collection per DataForm definition- with the limitaion that you cannot have ~30 fields (as we need to index built in things too)
- perhaps sharding mitigates this?
- map-reduce
- work harder on reducing document size, as the 1Meg limit is based on document size of the result
--
SvenDowideit - 29 Oct 2010
So, we want mongodb to sort because paging through large results requires it to be done at the db server, otherwise we lose the whole point of having the db do the work instead of Foswiki.
Would it be possible to make a mongo query with sort/skip+limit parameters to build a result where each item was purely ObjectID (12 bytes) and the sorted field (I assume you need the sorted field in order for sort() to be able to do its thing)? A 2nd query from Foswiki would then obtain the full documents (or the pieces of them it wants, anyway).
OTOH as you said on IRC I think collection-per-form is probably aligned with the nosql strategy mongo peeps seem to be advocating anyway. Also, many of my wiki-apps start with a "form.name='BlahForm'" anyway, as the form indicates the topic 'type'.
I think the most fields we have on our scariest form is ~20, and I'm working on normalising-out that one so it'll be back down to 12 or so. Sharding might mitigate this but not to the extent we need to have a solution... we have 2 shards and I hope that's enough for now.
The other reason I'm warming to the idea of collection-per-form is that we
could enhance
DataForms to specify index types. Maybe. One day
For example, I see that there's a mongodb 1.7 patch which allows polygon (rather than box, circle) 2d geospatial search.
Hm. I guess collection-per-form
could be a bit more fiddly for a 3rd-party (non-foswiki) app to work with. OTOH it might actually make it easier - all the 3rd-party app ideas we have for talking directly to mongodb without Foswiki in the middle, would probably benefit from having the topics already grouped by form - as they will want to be working on a particular type of (data) topic.
Well, I don't think I actually contributed to a solution, I'll stop typing now...
--
PaulHarvey - 29 Oct 2010
Hmmm, mongodb.org needs to update its docs.
SERVER-1140 seems to imply that 1.6 can have 64 indexes on a collection.
--
PaulHarvey - 31 Oct 2010
Okay, another thought.
SemanticLinksPlugin wants efficient SEARCHes too. It registers a few META:BLAH types, and I guess they will want to be indexed as well.
So the problem with indexes seems to be mounting. Hmmmmm.
Maybe
MongoDBPlugin is going to need to split things up into collection per form; and dedicated collections per META:TYPE?
This will require some tedious book keeping. Collection-per-form isn't too bad, the topic object will exist only once.
But splitting up pieces of the topic into constituent bits means (I guess) the 'topic' document maintaining dbrefs into the other META:TYPE collections, and of course the various collections per META:TYPE will have a dbref back to the 'topic' object.
Is it really tedious to eval js on the server so that Foswiki can obtain a 'complete' topic object in just one request? Or is this approach doomed to fire
n + 1 requests to completely fetch a topic, where
n is the number of different META:TYPEs in use on the topic?
/end ramble
--
PaulHarvey - 19 Nov 2010
As of now, this item is the urgentest. Anecdotally for us it gets "too large" to sort somewhere between 1000-1500 topics
--
PaulHarvey - 24 Mar 2011
MongoDBPlugin:d8285c0b83fd is actually for this task, and makes a significant improvement on my tests
--
SvenDowideit - 31 Mar 2011
This still:
%SEARCH{
"form.name='Lauries.GlossaryForm'"
web="Lauries/GlossaryData"
type="query"
order="formfield(Base)"
pager="on"
}%
Produces this:
Could not perform search. Error was: query error: too much data for sort() with no index. add an index or specify a smaller limit
I can however, see that we can sort
TaxonProfile/Definitions now. So I guess we'll keep working on that
--
PaulHarvey - 01 Apr 2011
ok, with the commits we have now,
order=formfield(Base)
works on my system - can you confirm?
--
SvenDowideit - 08 Apr 2011
I'm having trouble with the insects web -
here's Insect000000-Insect078517, can you load it and tell me how many pages the
SEARCH pager gives you with the following expression:
%SEARCH{"1"
type="query"
web="System.MigrationScriptsInsectsDemo"
pager="on"
pagesize="10"
}%
http://wiki.trin.org.au/Sandbox/TestTopic12 only gives me 1937 pages, indicating that my sort tests are only sorting on ~19,370 topics..
--
PaulHarvey - 23 Apr 2011
So, after doing a db.dropDatabase on the web, and then doing another updatweb... I found that mongo wasn't actually loading the topics! Instead, I had to re-load the root (System) web.
--
PaulHarvey - 23 Apr 2011
so you mean to say, it works, the bug is fixed, but we have some kind of (new task) issue with loading?
--
SvenDowideit - 23 Apr 2011
Indeed
--
PaulHarvey - 23 Apr 2011
Okay, I'm able to sort the insects DB (first 162,000 anyway!) on any formfield in around 3s typical. Nice
I think we can close this.
--
PaulHarvey - 24 Apr 2011