Item13100: bulk_copy.pl needs thorough review & testing for 1.2

pencil
Priority: Urgent
Current State: Closed
Released In: 2.0.0
Target Release: major
Applies To: Extension
Component: Store, FoswikiTools
Branches: master
Reported By: GeorgeClark
Waiting For:
Last Change By: GeorgeClark
Summary says it all.

The store converter needs to be highly robust.

If the converter is proven robust for data + pub, and bidirectional, then we ship PFS as the 1.2 default. If there are blockers that can't be fixed for 1.2:
  • Ship PFS but not as the default. Safe for new sites,
  • Defer PFS until converter is reliable.

-- GeorgeClark - 17 Nov 2014

Please see attached:

This needs work but as it was developed to include changes to implement StoresShouldBePassedConfigHash; which is not approved for 1.2 at least so I need to back this out.

In it's absence I need to perform a full copy of $Foswiki::cfg between all store calls — and that's the problem the above feature proposal is attempting to address.

The code was hacked together within PlainFileStore tools directory but really should be moved into a separate StoreToolsContrib.

store.pl was developed to be a wrapper around many Store tools; additional tools can be very compact.

-- JulianLevens - 17 Nov 2014

I believe that this is a deeper issue than a copy_store option.

We are moving from a MonoStore culture to a PolyStore culture and we need be sure we can support AnyStore for: new installs; minor release upgrades; patch upgrades and extension installs (e.g. System.NewThingPlugin topics).

In more detail there are a number of use cases where Webs/Topics will need to be copied or merged from one install to another.

  1. New Install
    • After install convert ReleaseStore (in /data PFS and RCS neutral with no histories) to DesiredStore
    • Nice and easy no complications
  2. Upgrade Install
    • New Install
      • Basic Sanity Test (no saves) using ReleaseStore
    • merge ReleaseStore with OldStore
      • copy_store ReleaseStore to NewStore
      • copy_store OldStore merged over NewStore (Upgrade Procedure #4)
        • copy_store needs feature to only copy non-default webs — that's pretty easy
      • copy_store specific OldStore topics over NewStore (Upgrade Procedure #5)
        • copy_store needs feature to only copy requested Topics — again pretty easy
        • copy_store needs feature to potentially obliterate NewStore topics from OldStore (originally ReleaseStore) — that is Topic + History only from OldStore
  3. Patch Install
    • Backup OldStore
    • copy_store ReleaseStore merged over OldStore
      • Just means force new Topic and History except for some topics (e.g. WebPreferences)
      • Needs a spec of exception topics
  4. Plugin Install - new/replaced Plugin topics
    • configure allowed to call copy_store in these circumstances?

I believe that the Store API needs a create method. This will destroy and then recreate the base of a store. For a DB based store that will (roughly speaking) DROP ALL TABLES then CREATE ALL TABLES with appropriate structure and indexes. A file based store would probably just rm -rf data; mkdir data. Importantly a copy_store will have no knowledge of how to delete/define AnyStore.

Adding this method to the Store API would give a running Foswiki the capability to destroy the underlying store and that makes me nervous. If it's agreed that this is an issue then I would suggest that Foswiki::Store::AnyStore has an additional Foswiki::Store::AnyStore::ForInstall or some such to provide dangerous methods. This would only need to add an extra Foswiki::cfg{ImplementationClasses} entry to enable it. Once install is complete the implementation classes would be adjusted and the ForInstall file could even be moved to tools until needed again. Am I unnecessarily nervous?

I suspect that there will be quite some impact on Docs as well.

All in all supporting PolyStores is not viable for 1.2 — IMO.

This in turn means that PFS is not viable for 1.2 as it depends on PolyStore support.

This does not preclude sites using PFS (it's a contrib), but the above issues mean that we are not ready to release it as 'core' just yet.

Could NextRelease focus on PFS and PolyStore support? Or just develop a PolyStoreSupportContrib which covers these issues in Docs plus provide a home for copy_store (and related tools)?

I'd still like to think more about whether an install should assume topics in data rather than say pfdata; that was discussed quite a lot in the IRC discussion in Development.ReleaseMeeting01x02_20141117

-- JulianLevens - 02 Dec 2014

I agree that the store converter has to be reliable, and the existing copy_store is not up to the job. For this reason I implemented tools/bulk_copy.pl which sits outside of Foswiki::Meta and as such is totally store-agnostic. Yes, testing is required - please!

Changed the headline to reflect that convert_store.pl is dead, long live bulk_copy.pl!

-- CrawfordCurrie - 02 Dec 2014

As per the irc logs the stores have a fall back to read the Topic.txt in data and this negates many of these concerns.

We are left with documentation updates and testing!

-- JulianLevens - 02 Dec 2014

I tried bulk-copying a Foswiki 1.2.0_999 rcs store with a site-charset of iso-8859-1 to a Foswiki 1.2.0_Beta_2 with utf-8 and plain-file store.

bulk_copy did not correctly re-encode iso-8859-1 special characters to utf-8.

I then manually converted my old store to utf-8 (topic names, topic content and attachment names), changed the site charset of my old installation (in localsite.cfg) to utf-8 and tried again.

Same result - Umlauts come out wrong, even though I'm now bulk_copying utf-8 to utf-8. It looks like they're doubly encoded.

I can't see what I'm doing wrong; there aren't all that many ways to screw this up. Might this be a bulk_copy bug or am I just stupid?

-- PascalSchuppli - 30 Jun 2015

Hi, It's rather difficult to diagnose this in a task. We probably need to see some of the topics before and after conversion to understand what's going on. Don't forget that the source and target installations need to be fully configured with the correct site character sets.

You might be able to get better help on the IRC channel #foswiki.

-- GeorgeClark - 01 Jul 2015

Should be OK now. Pascal please try again.

-- Main.CrawfordCurrie - 01 Jul 2015 - 10:43

Yes, now the character (non)conversion of Umlauts from UTF-8 to UTF-8 works fine (I haven't tried with my original ISO-8859-1 store yet)

There are still problems, though. I am getting tons of error messages for reads on closed file handles: read() on closed filehandle $fh at /data/www/[PRIVATE]/Foswiki-1.2.0/lib/Foswiki/Store/PlainFile.pm line 1374. I'm not sure about the consequences, but the bulk-copied store does not contain several attachments that were present and working in the original store.

-- PascalSchuppli -04 Jul 2015

I just tried again using Foswiki 2.0.1's bulkcopy.pl to try converting my original Store (ISO-8859-1, rcs) to Foswiki 2.0.1 (plain-text store). Still the same problem as before - the charset conversion does not work; for example, instead of a 'ö' character (encoded correctly as 0xf6 in the iso-8859-1 store), I get 'ö' in the converted text.

Copying from iso8859-1 to utf8 works.

More good news: The problems with the closed filehandles when copying from utf8 to utf8 seem to be gone.

-- PascalSchuppli -11 Aug 2015

I missed your update, so I just set up a test case for a topic with the complete iso-8859-1 character set, and was able to successfully convert it from 1.1.9 with {Site}{CharSet} = 'iso-8859-1' to Foswiki 2.0.1 with {Store}{Encoding} = 'utf-8'.

This task has checkins that were merged into 2.0.0. As that code has been released, this task needs to be closed. Please create a new task if you encounter any further issues.

-- GeorgeClark - 13 Aug 2015
 

ItemTemplate edit

Summary bulk_copy.pl needs thorough review & testing for 1.2
ReportedBy GeorgeClark
Codebase 2.0.0 RC, trunk
SVN Range
AppliesTo Extension
Component Store, FoswikiTools
Priority Urgent
CurrentState Closed
WaitingFor
Checkins distro:ce81eb73761e
TargetRelease major
ReleasedIn 2.0.0
CheckinsOnBranches master
trunkCheckins
masterCheckins distro:ce81eb73761e
ItemBranchCheckins
Release02x00Checkins
Release01x01Checkins
I Attachment Action Size Date Who Comment
LSC_fragment.txttxt LSC_fragment.txt manage 1 K 17 Nov 2014 - 16:40 JulianLevens LSC chunk showing config structure required for feature proposal
StoreTools.zipzip StoreTools.zip manage 83 K 17 Nov 2014 - 16:26 JulianLevens Better copy_store?
Topic revision: r15 - 08 Dec 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy