Item11091: RCS VC needs to handle mauled .txt better
Priority: Urgent
Current State: Closed
Released In: 1.1.4
Target Release: patch
This is specific to the RcsWrap and RcsLite store modules.
When an external agency (e.g. vi) modifies a
.txt
file, Foswiki has a hard time recovering. There is a high risk of lost data, lost history, and corruption, because the core has far too much knowledge of the cache. The goal of this work is to push all knowledge of the
.txt
/
.txt,v
dichotomy below the store horizon by using unit tests to verify correct behaviour of the store.
(Note that where we refer to
.txt
below, that equally applies to an attachment, though obviously TOPICINFO is not an issue there)
An RCS-controlled file may be in one of four possible states:
- up to date means the
.txt
and .txt,v
both exist, and are consistent
- inconsistent means the
.txt
is newer than the .txt,v
- no history means the
.txt
exists but there is no .txt,v
- no txt means the
.txt,v
exists but there is no .txt
up to date is the state the code currently assumes for everything. After discussion on IRC we have decided
no text
is a non-topic, as without a .txt the topic is not searchable and therefore to all intents and purposes does not exist.
Note that if there is no
.txt
but a
.txt,v
already exists when a topic is edited/created, the edit proceeds on empty content, but when the topic is saved the original history is intact. I think on balance that this is desirable behaviour, as it allows clean recovery when a .txt is accidentally deleted on the server.
So the states we need to focus on for testing are
no history and
inconsistent.
The following Foswiki::Store methods require special handling.
Method |
no history |
inconsistent |
readTopic |
correct TOPICINFO |
correct TOPICINFO |
getRevisionHistory |
return [1] |
load rev history and add [N +1] |
getNextRevision |
return 1 |
return N+2 |
getRevisionDiff |
checkin |
checkin |
getVersionInfo |
TOPICINFO for author, rev 1 and current date |
unknown user, rev N+1, current date |
getRevisionAtTime |
checkin |
saveAttachment |
checkin |
saveTopic |
checkin |
repRev |
abort and handle as saveTopic |
All other existing tests need to be run on the
inconsistent and
no history cases to ensure correct functioning.
Notes on the tests:
- A topic with no
.txt,v
will always be rev 1, irrespective of the TOPICINFO. However the author and date fields will be respected for getRevisionInfo.
- 'checkin' of a topic means
- META:TOPICINFO in the mauled .txt is rewritten/added before save; the version is set to 1, and the user to the unknown user (as defined by the BaseUserMapping)
- revision info is requested using
getVersionInfo
- If state is no history and .txt has TOPICINFO, that is used to get author and date. Otherwise author is set to the unknown user, date is set to the current date and rev to 1
- if state is inconsistent then TOPICINFO is ignored and author is set to the unknown user, date to the current date and version to N+1
- Existing code should fail these tests.
- Tests are being added to RcsTests.
--
CrawfordCurrie
Discussion moved here from
Item10993
Question: when checking in a topic that has had it's
.txt
modified, what user should be used? Choices are:
- The user from the TOPICINFO
- If there is one
- And if they are a real user (how can you check?)
- The unknown user (or some other user known to the BaseUserMapping)
- A CUID nominated in a configuration setting (defaulting to the unknown user)
IRC conclusion - some "OfflineUser"
--
CrawfordCurrie - 30 Aug 2011
See also:
- Item10427: rev details lost if .txt file is edited externally
- Item10390: Meta documented as supporting missing revs ... doesn't
- Item9969: REVINFO and $meta->getRevisionInfo() has issues when the ,v file is re-built after it was lost.
- Item10563: Task updates by svn commits breaks compare
--
GeorgeClark - 30 Aug 2011
Many thanks, George. I have trawled these other reports and updated the test set above to reflect the requirements.
Note that I'm very concerned about the definition of "newer" because it requires a
stat()[9]
to evaluate it (two, in fact)/ IME
stat
is not fully supported on all platforms, and some file systems get it wrong. If anyone with more experience can advise, I'd be grateful.
--
CrawfordCurrie - 31 Aug 2011
Did the above cover the case of the first change of an existing topic without history? It's really important to checkin version 1 (before change) and version 2 (after edit) or we go back to being unable to revert any changes to shipped System topics.
As you mentioned in one of the other tasks,
rev 0
was wrong and and a pretty bad kludge, but given the current implementation it was the only way I could find to trick store into saving both rev 1 & 2. Otherwise the "current" rev 1 - without history - ends up accumulating the initial changes. This needs to be covered in the tests. The shipped
AdminGroup, any of the
WebNotify topics,
WebHome, etc. are all particularly at risk. Another solution would be to go back to shipping rcs files to side-step the issue.
--
GeorgeClark - 31 Aug 2011
Yes, I know. The two cases, topic with no history and changed topic with an inconsistent history, both need to be tested and both require some delicate handling.
--
CrawfordCurrie - 04 Sep 2011
I have re-coded the VC store to handle the cases where a .txt has no corresponding .txt,v (no history) and a .txt is newer than the corresponding .txt,v (inconsistent history). I have added unit tests to cover these cases.
Note that there is a potential race condition where a .txt is not created by RCS until the clock has ticked over after creating the .txt,v. I'm not sure if this is possible, but it's a risk.
In the end the changes were almost entirely localised in the VC store; no other APIs or core code were touched, so I decided to merge the changes across to the patch as well.
--
CrawfordCurrie - 11 Sep 2011
Crawford, re your note about the fixes not working if file system doesn't set modification times, Should we have a checker in configure that verifies modification times are being set correctly and reporting an error if not? (Though not sure what the action should be if times are not set).
--
GeorgeClark - 11 Sep 2011
Quite - I don't know what action to take either, except to advise the user to use a different file system. On reviewing my notes, the problems with stat() are not so much with stat()[9] (modification time) as with creation and access times not being updated. So I think we'll be OK.
--
CrawfordCurrie - 20 Sep 2011