Priority: Normal
Current State: Needs Developer
Released In: n/a
Target Release: n/a
I have a mediawiki dump that I need to convert to Foswiki. Running the current importer crashes in the cpan modules, so I did some work to make it work for
this conversion.
things I found out as I went.
Parse::MediaWikiDump
is deprecated by the author, but he's provided what should be a functionality compatible shim for the new work. This shim crashes in the same way
but by the time i found that out, I knew a little more about what I needed.
Turns out that
MediaWikiToFoswikiContrib does not support mediawiki dumps that contain revisions - specifically because the cpan module it uses does not, and thus wasn't coded to enable it.
So, I replaced the shim with calls to the full
MediaWiki::DumpFile
(without major changes to the existing Contrib's code - as I prefer not to make this any more than a temporary branch of the code) (its also meant some of the formatting is off - I don't know what modes Micha uses).
After the minor hackery, I re-ran the import overnight, and it took the better part of 12 hours to get about 10-20% of the way through the 2000 topics i'm importing - so I re-ran the difficult page under NYTProf, and found a simple fix:
1 while $_[0] =~ s/{{(?!.*{{)(.+?)}}/$this->handleTemplateCall($page, $1)/ges; # includes
is about 1000 times slower than (remove the
.*
from the lookbehind -
micha can you confirm if that changes the result, I think not, but.. )
1 while $_[0] =~ s/{{(?!{{)(.+?)}}/$this->handleTemplateCall($page, $1)/ges; # includes
just one revision of the troublesome topic (which has about 200 revs i think) went form 300seconds to 300 milliSeconds
mmm, ok, commit log on my git:
sven@quiet:~/src/foswiki$ git log trunk..HEAD
warning: refname 'trunk' is ambiguous.
commit 9784102ca4ecf3d3618f5bb75bd991d0def6b99b
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date: Sun May 8 15:21:01 2011 +1000
add debugging to let me choose a difficult page to convert, add some shims to cater for differences in the non-legacy mode - can't ask a topic for its namespace, so instead store the web&topic we've decided to save as, and use that to decide if its a Template topic,
commit 3fb1f67f660618433a8f36b382376ce13c3fa774
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date: Fri May 6 22:20:24 2011 +1000
use the new revision's way to get author's username, and allow handleTemplateCall to default to the Template web - as thats what MediaWiki appears to do
commit ce4eb02ebd69b8d6984d1fbd0352ee98733c130f
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date: Fri May 6 21:37:46 2011 +1000
re-jig to not use the Compatibility module, as that doesn't support topic revisions
commit 62c29ab9a55f93ff607e8e277dedd1118a2e0e0c
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date: Thu May 5 12:55:08 2011 +1000
update DEPs
commit 64580691a690ff91ea43eb8f3bb3bc04fdb20831
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date: Thu May 5 12:54:39 2011 +1000
add a little help
commit 9b6c2237aa72c1a181fe8d1580e7e45ef08f1c14
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date: Thu May 5 12:54:21 2011 +1000
update to new MediaWiki::DumpFile::Compat, begin to use Foswiki::Func so that versioning becomes trivial, begin to work though assumtions wrt cmd-line options
I think the gist of the changes are worth it, but given the response on irc, I've not made any effort to refactor or clean them.
The changesets I've made over the last few days
--
SvenDowideit - 08 May 2011
no response from Micha - I'll let him decide what to do with my changes (unless I get more conversion needs)
--
SvenDowideit - 02 Oct 2012 - 10:42