You are here: Foswiki>Tasks Web>Item10723 (07 Aug 2023, MichaelDaum)Edit Attach

Item10723: MediaWikiToFoswikiContrib

pencil
Priority: Normal
Current State: Needs Developer
Released In: n/a
Target Release: n/a
Applies To: Extension
Component: MediaWikiToFoswikiContrib
Branches: trunk
Reported By: SvenDowideit
Waiting For:
Last Change By: MichaelDaum
I have a mediawiki dump that I need to convert to Foswiki. Running the current importer crashes in the cpan modules, so I did some work to make it work for this conversion.

things I found out as I went.

Parse::MediaWikiDump is deprecated by the author, but he's provided what should be a functionality compatible shim for the new work. This shim crashes in the same way smile but by the time i found that out, I knew a little more about what I needed.

Turns out that MediaWikiToFoswikiContrib does not support mediawiki dumps that contain revisions - specifically because the cpan module it uses does not, and thus wasn't coded to enable it.

So, I replaced the shim with calls to the full MediaWiki::DumpFile (without major changes to the existing Contrib's code - as I prefer not to make this any more than a temporary branch of the code) (its also meant some of the formatting is off - I don't know what modes Micha uses).

After the minor hackery, I re-ran the import overnight, and it took the better part of 12 hours to get about 10-20% of the way through the 2000 topics i'm importing - so I re-ran the difficult page under NYTProf, and found a simple fix:

 1 while $_[0] =~ s/{{(?!.*{{)(.+?)}}/$this->handleTemplateCall($page, $1)/ges; # includes
is about 1000 times slower than (remove the .* from the lookbehind - micha can you confirm if that changes the result, I think not, but.. )
 1 while $_[0] =~ s/{{(?!{{)(.+?)}}/$this->handleTemplateCall($page, $1)/ges; # includes
just one revision of the troublesome topic (which has about 200 revs i think) went form 300seconds to 300 milliSeconds smile

mmm, ok, commit log on my git:
sven@quiet:~/src/foswiki$ git log trunk..HEAD
warning: refname 'trunk' is ambiguous.
commit 9784102ca4ecf3d3618f5bb75bd991d0def6b99b
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date:   Sun May 8 15:21:01 2011 +1000

    add debugging to let me choose a difficult page to convert, add some shims to cater for differences in the non-legacy mode - can't ask a topic for its namespace, so instead store the web&topic we've decided to save as, and use that to decide if its a Template topic,

commit 3fb1f67f660618433a8f36b382376ce13c3fa774
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date:   Fri May 6 22:20:24 2011 +1000

    use the new revision's way to get author's username, and allow handleTemplateCall to default to the Template web - as thats what MediaWiki appears to do

commit ce4eb02ebd69b8d6984d1fbd0352ee98733c130f
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date:   Fri May 6 21:37:46 2011 +1000

    re-jig to not use the Compatibility module, as that doesn't support topic revisions

commit 62c29ab9a55f93ff607e8e277dedd1118a2e0e0c
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date:   Thu May 5 12:55:08 2011 +1000

    update DEPs

commit 64580691a690ff91ea43eb8f3bb3bc04fdb20831
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date:   Thu May 5 12:54:39 2011 +1000

    add a little help

commit 9b6c2237aa72c1a181fe8d1580e7e45ef08f1c14
Author: Sven Dowideit <SvenDowideit@fosiki.com>
Date:   Thu May 5 12:54:21 2011 +1000

    update to new MediaWiki::DumpFile::Compat, begin to use Foswiki::Func so that versioning becomes trivial, begin to work though assumtions wrt cmd-line options

I think the gist of the changes are worth it, but given the response on irc, I've not made any effort to refactor or clean them.

The changesets I've made over the last few days

-- SvenDowideit - 08 May 2011

no response from Micha - I'll let him decide what to do with my changes (unless I get more conversion needs)

-- SvenDowideit - 02 Oct 2012 - 10:42

 
Topic revision: r5 - 07 Aug 2023, MichaelDaum
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy