Feature Proposal: Add snapshot functionality, making it possible to browse topics history consistently

Motivation

Nowadays users are able to see the history of a topic alone. Attachments are the latest (instead of the ones at the time of the revision being viewed) and also internal links point to the latest revision. It's not possible to browse topic history consistently. It should be nice to provide some kind of snapshot functionality. Users have asked this ( Support.Question373 and Tasks.Item6090) and we never wrote this proposal. Let's mature it and make it happen.

Description and Documentation

I thought about a snapshot mechanism similar to svn revision numbers and the tree-structure used by git: each change would generate a new snapshot. Once we have this record, it should be possible to browse wiki history, instead of topic history:
  • Internal links would point to the correct revisions (the ones that existed at the time of the revision being viewed)
  • Attachments would look like they were at the specified time
We should solve some issues:
  • What if user A saves topic A and user B saves topic B and then user A saves topic A again within one hour?
    • ALERT! We could create snapshot points for each change, but then a topic revision would point to many snapshots...
    • We should replace the snapshot as we replace the topic revision
  • How to deal with permissions: use the current or the ones that existed at that point?
    • tip We could also add a (ALLOW|DENY)VIEW(TOPIC|WEB)HISTORY permissions.
  • How to get older versions of attachments without viewfile?
    • Or, at least, use view file only for older revisions
  • How to handle the case where users put a direct link to attachment or %ATTACHURL% or %PUBURL% or %SCRIPTURL{viewfile}%?
    • How about to add %ATTACHMENT{"name"}% and deprecate %ATTACHURL%?
  • ALERT! It should be simple and possible to build the snapshot information from the existing files
  • How to handle other variables. Pages that customize their results based upon set statements, etc, today always get the current value, and not the older revision. A true point in time snapshot needs to take a point in time view of the Set statements as well.
  • The snapshot should take into account:
    • Topics may have been moved or renamed (we may limit this to cater only for renaming inside the same web)
    • Topics may have been deleted since the specified time
    • Topics may not have existed at that time.
    • It should not be possible to create new topics or edit content based on a snapshot view (That would be a different proposal altogether smile )

Implementation

-- Contributors: GilmarSantosJr, GeorgeClark, MichaelTempest

Discussion

Let's discuss this and build a more detailed spec.

-- GilmarSantosJr - 11 Jan 2010

In my mind we don't want to confuse access controls more than we already do. The current access controls should always be used. Otherwise Eg. it becomes impossible or convoluted to restrict privileges against an individual whose trust or authority/role has changed, for all content that they used to have access to (granted it's the old versions). And what about the individual who has gained privileges - should they be able to view old revs they didn't have access to at the time? Building a tool which shows who can access what for a given private resource will also become a nightmare.

But, this is something that needs attention, thanks for writing it up.

I think as a first step it would benefit users and certainly some plugins (Eg. DirectedGraphPlugin) to at least use attachment revisions of an appropriate version when looking at an old topic rev.

Perhaps %ATTACHURL[PATH]% and %PUBURL[PATH]% could expand out to a special bin script (or just extend viewfile) in these cases, which would handle delivering an older rev to the web browser as necessary. Certainly it should be possible to make usage of %SCRIPTURL[PATH]{"viewfile"}% smarter, but then I have to think: should links built with %SCRIPTURL[PATH]{"view"}% being displayed in a prior topic rev, should they be rendered to current versions of resources? Or the version of the resource that was current at the time of the topic rev? Hmmmm.....

-- PaulHarvey - 11 Jan 2010

I added a point above about Set statements in general, not just permissions. However, permissions are a difficult issue, and are not always intuitive to the user.
  • For example, the user removes some "sensitive information" from a topic and then relaxes the permissions.
Should the revisions containing the sensitive information still be protected? Probably the only way today to protect the information is to create a new topic without the sensitive history. When it comes to viewing history, it would probably be safer to always take the most restrictive access controls, but the complexity would be significant.

Regarding DirectedGraphPlugin, the latest version disables the plugin when viewing a topic revision. The plugin would probably need to become aware of snapshots. The plugin dynamically generates the filename that is seen by the user. So on an old rev of the topic, it might not be just a different rev of the attachment, but possibly a completely different attachment if the sequence of graphs changed.

-- GeorgeClark - 11 Jan 2010

On the permission subject, I think we should take the current permissions and add (ALLOW|DENY)VIEW(TOPIC|WEB)HISTORY permission. If one erases sensible data and relax permissions, history could still be protected by this new permission. If the new permission is not set, the current behavior is kept.

@Paul: links should point to the revisions that existed at the specific point in time... like I wrote in Support.Question373 about attachments:
Suppose a user writes <img src="%ATTACHURL%/image.png" />. It points the browser to the latest revision of the image, even if this is present in older revisions. OTOH, if you have %ATTACHMENT{"name"}%, foswiki would render this pointing to the correct revision: <img src="%SCRIPTURL{viewfile}%/%WEB%/%TOPIC%/image.png?rev=X" /> if it's an old revision or <img src="%ATTACHURL%/image.png" /> if it's the latest.

-- GilmarSantosJr - 11 Jan 2010

Perhaps I am interpreting "snapshot" differently, but I would like to be able to navigate around the wiki, seeing topics as they were at time X, e.g. 27 November 2008 13:42:04 GMT.

This would affect the rendering of links to other topics. There are complications (I list some below), but this would be very, very useful indeed:
  • The snapshot should take into account that topics may have been moved or renamed (we may limit this to cater only for renaming inside the same web)
  • Topics may have been deleted since the specified time
  • Topics may not have existed at that time.
  • It should not be possible to create new topics or edit content based on a snapshot view (That would be a different proposal altogether smile )
-- MichaelTempest - 12 Jan 2010

You interpreted it correctly and your points are relevant. Added them to the list.

-- GilmarSantosJr - 12 Jan 2010

I think, a general solution of a consistent history is only possible, if every new attachment causes a new revision number. In that case, a history snapshot could be created with a "timebased" media function:

1. function is getting the edit-timestamp of the topic revision

2. if the timestamp is maching with the timestamp of the last attachment version you get an

<img src="%PUBURL[PATH]/%WEB%/%TOPIC%/image.png" />

3. if the timestamp is not matching:

you should get the image via a new viewfile-function which get the image with a timebased query (and not via the revision number)

I think this procedure solves a consistent history regarding media data. So you will allways see the correct topic version with the, at this present time, bounded attachments.

But this is no solution, to have a presentation of the whole wiki at the time t-1...

-- AlexanderStoffers - 12 Jan 2010

As part of our efforts to build a new skin, we are planning to overhaul the history screen as well. Please check WireframesHistoryScreen for the latest mock ups and add your thoughts about how to visually display your proposal.

Note: I haven't started mocking up the history screen yet but will likely do so within the next 2 weeks. If you guys have ideas or inspirations from elsewhere feel free to start the respective topic beforehand.

-- CarloSchulz - 12 Jan 2010

This discussion started up again on foswiki-discuss. I'm not going to paste the thread here; look for the topic "Whole-repository snapshot using time-date or label. Was: Re: Seeking feedback: 'Web.Topic@123' topic name notation"

The most trivial proposal I have seen to date is the idea of some "time travel" environment setting - e.g. a preference - that simply forces the wiki into a read-only mode. The issue with that is knowing how far you need to lie about the time - there are various macros, such as %SERVERTIME, which are used in wiki apps, for example to retrieve lists of other topics ("latest at this time") - and how far the lie needs to spread (can you lie about just a subset of the wiki, or do you have to lie about the whole thing? Can you lie about it just for the viewer, or does everyone have to buy the lie?). The same is true of a symbolic label (which is, after all, just another way of stamping the time).

Note that the most important thing needed to make this a reality is commitment from one or more developers to make it happen.

-- CrawfordCurrie - 05 May 2011

Thanks for point out this proposal, Crawford - I had forgotten about it and I missed it when I looked for proposals relevant to the discussion on foswiki-discuss.

This was given above as an unsolved issue:
  • What if user A saves topic A and user B saves topic B and then user A saves topic A again within one hour?
    • ALERT! We could create snapshot points for each change, but then a topic revision would point to many snapshots...
    • We should replace the snapshot as we replace the topic revision
I propose storing the time/date of the most-recent snapshot, and force a new revision if someone saves a topic that was last saved before the most-recent snapshot.

I can definitely see the need for different viewers to see different snapshots (of the same content) at the same time. I would like to be able to look at different snapshots (of the same content) at the same time (even if I have to look at them in different browsers so that I have multiple sessions). I think this means the "lie" is per-viewer, and definitely not for everyone.

What would be the unit of snapshot - the whole wiki? a web? a set of webs (listed where?)? a web-and-all-of-its-subwebs? On my wiki, it would be useful to snapshot a web and all of its sub-webs. I would prefer it if the snapshot is not the whole wiki, but I could still live with that.

One reason not to snapshot the whole wiki is that we may need settings and configuration information from the current Main and System webs whilst viewing a snapshot. For example, SitePreferences may have changed, Group topics in Main may have changed, and template topics (e.g. view templates) in System may have changed (e.g. because of upgrades to plugins).

I think the snapshot should be restricted to wiki content, meaning webs, topics and attachments. I think the snapshot should not affect the versions of plugins, or the settings in LocalSite.cfg

-- MichaelTempest - 05 May 2011

I'm not sure what use cases are being looked at here. The desire to have the result render differently to how it wold have at the time seems misplaced? It seems to me that it would be enough to use PublishPlugin to publish snapshots to a simple versioned store, and this would be free of any ambiguity around how to render and what is/isn't possible. That is, the page by definition looks as it did back then and it is read only.

Ideally that versioned store would be abstracted so that a reasonably modern revision control system (ie not rcs) could be used or even a versioning filesystem, but with the lcd implementation being to create a separate copy (in a separate directory, directory name = label) of the rendered results. Clearly this doesn't scale too well, but it isn't actually unreasonable (in my case at least, from a load/performance perspective) to publish a whole web periodically (eg daily). Due to the same dependency issues that impact on caching, it isn't simple to determine if re-rendering the page will produce a different result. In fact, the right thing to do is to rely on caching to do the work for us (ie deliver a cached result if it can). Potentially, this could allow such publishing to be done even more frequently at low cost (load). And a periodic publish will prime the cache - might be a feature where there is a lot of read only use of the current content.

It is clearly possible to compress the history even without a revision control tool:
  • Where the resulting page is identical to the previous version, make it a link. This is particularly important for slowly changing large content like attachments, images.
    • hardlink preferred - allows versions to be decimated/purged to keep certain snapshots forever and delte the rest after some time (eg keep weekly, or every "release" forever, keep dailys only for a month).
With a revision control tool each publish isn't to a different dir but simply creates a new label and retrieval is by label. This could be done on top of rcs but would be more trivially achieved (and faster) on top of a more modern changeset based tool or even a versioning filesystem.

It isn't clear that there is a practical way to do much better than this - trying to retrieve old version of actual topic data and re-render seems unnecessarily complex and creates issues where anything other than a read only view being rendered is clearly going to create anomalies - so why not just do exactly that (create a read only result) exactly once, in advance.

As retrieval of the rendered content is by an explicit access to that contents directory (or label) a user can easily look at one or more old versions and the live wiki at the same time. No session state to maintain.

-- DarrylGreen - 05 May 2011

Hi Darryl, I think you would be surprised at what the Store API can do in trunk and even 1.1 (although I'd appreciate a reality check from Crawford & Sven). For example, MongoDBPlugin is already a cache of every version of every topic. So accessing past revisions is as quick and painless as fetching any other object from MongoDB, no need to ask RCS to get it.

My thoughts are that this snapshot thing could be broken into the discrete (minimum, to begin with!) features required to get it implemented:
  • Allow the core to ask the Store to give it topic-rev-at-time,
  • some sort of "back-in-time" state that is global to the entire request, which makes anywhere that Meta is asked for a topic without any rev specifier, the time-machine date-time becomes the rev spec,
  • carefully making sure that any loading of prefs is always done with the current version of topics

Once we get this far, I think even this would be useful, although it will be incomplete. For example, our dashboards on the wiki at work can show a manager exactly what state our project wiki-app looked like at some past point in time, instead of reviewing past revs of the individual record topics separately.

I hope it will become more obvious how to handle topics changing names (hopefully that complexity can be mostly delegated to the Store implementation), viewfile/pub issues, blurring of versions with repRev/1hr-edit windows, etc. It is definitely good to have a complete vision of what the final end product might look like, if possible... I just worry about us getting bogged down and stalled on the hard & hairy parts, while the simple & clearly definable parts get neglected smile

-- PaulHarvey - 05 May 2011

PublishPlugin works very well for static snapshots. I am already using it for that. However, I would like the ability to have a "dynamic" snapshot. I want to do things like search a previous snapshot and work with pages with %CALCs that use URLPARAM values, and see what changed between snapshots. That is all beyond the scope of PublishPlugin.

Copying a whole web could work (and has been done before - see the tmwiki documentation for previous releases), but just copying a web is not enough -
  • There needs to be supporting infrastructure for creating named snapshots and making them read-only automatically (for everybody including the administrator)
  • There needs to be a way to compare the contents of snapshots (whether by name or by time/date)
  • There needs to be a way to see what (named) snapshots there are
  • I would like to see how a topic has changed between named snapshots - i.e. a history view showing only the revisions associated with named snapshots

I am not trying to say that we should not implement snapshots by copying webs, but I don't think we are in a position yet to evaluate different possible implementations. I am also not saying that we shouldn't consider implementation issues, because they help to get a grip on the requirements. I don't think we know what all the requirements are - I certainly don't smile .

-- MichaelTempest - 06 May 2011

What does such a "dynamic snapshot" give you over and above a "static snapshot"?
  1. Searching using wiki search pages, embedded %SEARCH
  2. Reactive to URL params
  3. Reactive to access controls
  4. Diffs
Anything else?

-- CrawfordCurrie - 06 May 2011

5 years, no activity, no developer. Changing to Parked.

-- GeorgeClark - 13 Feb 2016
Topic revision: r15 - 13 Feb 2016, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy