Topic Object Model (formerly TWiki Object Model)

Note: CC RaymondLutz's original proposal is available from the history of this topic, but as it got no implementation support, I have taken over this topic title for this closely related issue

You may be familiar with the Document Object Model (DOM) as defined for HTML documents. The DOM defines an abstraction of the nodes in an HTML document that allow them to be referenced and manipulated in a consistent way. The DOM is fundamental to the success of Web 2.0, as it is a key factor in the success of AJAX.

There is general agreement among regular contributors that it would be beneficial to define such a model for TWiki.

The basic idea is to allow the contents of the TWiki data store to be referred to and manipulated to in a consistent and uniform manner.

First, some goals of the model:
  • don't break anything.
  • create a single consistent, extensible ContentAccessSyntax that can be used to refer to TWiki data elements.
  • support generic operations over these elements.
  • extend the model into the topic, and support the concept of table, list and other DOM-level objects (maybe use the DOM for this?)
Technically, that means:
  • support access to data elements via a simple, consistent HTTP interface (e.g. AJAX)
  • support manipulation via generic operations such as move, delete, copy.
  • support the same interface for plugins and contrib authors.

OK, but what is the impact on users? Well, TWiki has three types of users:
  • end users, who simply use TWiki out-of-the-box
  • application developers, who use TWiki to develop knoewledge management applications
  • plugin developers, who are like application developers on steroids
The main impact on end users we expect to see is in terms of performance improvements. The existing TWiki data model is a major stumbling block in the way of scalability, but a TOM will allow us to map to existing high performance store implementations (such as databases) much more easily.

The impact on application developers and plugin developers will also be significant. Existing applications should continue to work as before, so this impact is by choice only, but TOM features will enable new applications and plugins to interact with TWiki data in a much more intuitive and consistent way. It opens the door for much more sophisticated interation modes, and much more rapid application development.

Where we are

The basis for data storage in TWiki (4.2 and earlier) are "Topic Files" that exist in flat directories, one per "Web", with the Topic files being named the same as the topic name with the ".txt" extension, such as "TopicObjectModel.txt". Each Web defines a namespace, due to the requirement that no two topic files have the same name in the same directory. Of course, topics may have the same name across different Webs. Topic histories are stored in RCS, and the most recent revision of a topic is kept permanently "checked out" (in the CM tool sense).

Topic files also double as a storage medium for topic meta-data. This includes things like lists of attachments, forms and form fields, and some aspects of topic history that cannot be represented in RCS (such as topic moves). Such meta-data is stored embedded in the topic text, and this assumption pervades much of the TWiki code. The place where it has most impact is in searching, where there has been an assumption that a search using regular expressions can match the meta-data embedded in topics. This assumption has penetrated deeply into TWiki applications, and it is only recently that support for query searches has made it possible to search without assuming embedded meta-data.

Over the last few years, several of the core contributors (mainly CrawfordCurrie and SvenDowideit) have been working steadily to more clearly define the levels of abstraction within the TWiki core code, with a TOM being the ultimate goal of this work. We have done this by first developing a set of testcases that allow us to refactor without breaking legacy features, and then incrementally refactoring the core code.

The most important new feature of this work from a user perspective is the TWiki query language, which defines a basic ContentAccessSyntax. This is largely based on the work of CrawfordCurrie and MichaelDaum on the DBCacheContrib, a proven approach that has been used to build some highly sophisticated TWiki applications.

What is being done

Work on the TOM is continuing in several different places:
  • SvenDowideit is working on a REST plugin, that will provide a first version of the key HTTP interface. This work will give users their first taste of where TWiki is going in the future.
  • CrawfordCurrie is working on finalising the TOM abstraction in the core, by elimitaing the TWiki "store" component in favour of an abstract web/topic/attachment model. This work should be invisible to end users, by is a key refactoring step.
  • SvenDowideit is working on the MultiStoreRefactor, which is an important step in increasing the abstraction of the store in the TWiki core.

What is next

I believe the time has come to bite the bullet and complete implementation of the TOM in the core. This, together with the promotion of the REST pluginto a core component are, I believe, fundamental to the future of TWiki and need to be the main thrust of TWiki 5.0.

Related topics: ContentAccessSyntax, MultiStoreRefactor

-- Contributors: CrawfordCurrie

Discussion

An issue which seems to escape people but could be a real killer is that of overlapping models. This is where two models of the data share content, and overlap. For example, a paragraph may be made up of text*-*table*-*text where table is a structural element. In some contexts - for example, a plain text edit - the fact that some TML represents a table is irrelevant. Another example is sections - a headed section may comprise paragraphs and subsections. Or we may want to ignore that structure completely, and view the topic as a series of paragraphs only. Or, if we are processing tables, as a list of tables.

This is a familiar problem in ECAD (CAD for electronics) where we have the concept of different views of the same data (e.g. layout versus simulation versus timing). The DOM also has this problem, in a really limited sense, and it has been solved there by provision of the getText method, which gets the flattened leaf text of a structural node. A general purpose TOM has a harder problem to solve, in that there are overlapping structural elements at several levels in the hierarchy.

Obviously if we break down the topic into the smallest granularity elements (paragraphs? sentences? words?) we can provide methods to reconstruct whatever higher structural levels we want; for example, "give me all tables in the document" can reconstruct the list of tables on the fly at runtime. This is really what the Javascript interface to the DOM does, in a rather heavy-handed way.

I personally feel that Javascript missed a trick, and it is much better to hide this implementation detail behind the facade of objects (this is, after all, a Topic Object Model). TopicComponents was the first expression of this idea. Basically, the idea is that the topic object has a set of subfields, each of which is an object that represents a view of the topic. It's an implementation detail how these views get populated.

To put this in context of the current topic data model, the 'topic' object has a number of views which reflect the different aspects of the topic; for example, fields is a view that iterates over the list of fields in the topic. attachments represents a view over things attached to the topic.

So far so conventional. The clever bit comes when we think of the other sub-objects of the topic object. For example, a sub-object called tables would provide an interface over the tables in the topic. Each table object would be a row-cell model of the table. A sub-object called sections might provide a hierarchical view of the text of sections in the topic.

What really makes this proposal different to anything that has gone before is the idea that an extension = a Contrib - can provide an implementation of an new view of the topic. For example, the ActionTrackerPlugin would like to provide an actions view of the topic, that other extensions could then leverage to examine actions.

I think each view would have to lock out other views, as otherwise updates could get really difficult.

OK, how would this be reflected to the user? Well, using the ContentAccessSyntax, we might refer to the individual view by qualifying references with the sub-object name. For example, attachments is a view over the attachments on a topic. At the moment to access the size of the attachment called Joe.gif on a topic, we write attachments[name='Joe.gif'].size. If I add a paragraphs view, then this naturally extends to paragraphs[1] to get the first paragraph.

(Note that there is a horrible complexity; the same complexity that banjaxes the idea of full Wysiwyg. TML variables can generate new TML, so the topic object model can only ever represent the structure that is inherent in the unrendered topic.)

Those of you familiar with MVC architecture will recognise the concept of a multifacted view. Pushing this connection further, we come to the concept of an architecture in which a controller receives and dispatches change events within the model. That's the direction I think this should move in.

-- CrawfordCurrie - 01 Jun 2007

Before commenting on ContentAccessSyntax etc I'd like to bring into mind that any TOM should not only lead to convenience in writing TWikiApplications but also performance for two reasons: first, doing the same application with the means at hand today is too complex (e.g. the various table plugins each parse all of the topic text repeatedly). second, a TOM would allow to store objects more efficiently, i.e. index them.

Another part not fully discussed in the range of TOM as far as I see, is TWikiForms. Their capabilities, as flexible as they are currently, have their limitations stemming from the current way of storing them. So refactoring the TOM and its store will most probably remove a couple of its limitations as well (e.g. only one form per topic, forms not being an independent object, formfields not iteratable etc). At least it will allow us to overcome these.

Now, let me comment on ContentAccessSyntax and the way TOM is approached here in general. Sorry, that I have put the Mr. General hat again. But while being exposed to standards like XQuery, XPath and XUpdate, these techniques have been invented from the same reasons TWiki wants its own TOM now. All I want to say is: we need to look at standard techniques already available instead of inventing our own.

So using a backend store like dbxml, we solve most of what is described in ContentAccessSyntax right away. Note, that dbxml is no relational database but a native XML database which IMHO fits TWiki, i.e. its structuredness, very well. On top of a standard content access syntax, XQuery, we already get scalability and performance on a level TWiki won't reach on its own ever.

The question how to TOM will then be reduced to defining an XML schema for TWiki topics and interfacing dbxml (or any other comparable xml database) in a clean fashion.

Besides that dbxml comes with many more solutions, i.e. FLOWR (for-let-orderby-where-return), for problems all discussed independently in ResultSets, SearchResultsPagination and ExtractAndCentralizeFormattingRefactor.

At least you need to give very very good arguments why anything like XQuery is ruled out.

-- MichaelDaum - 06 Jan 2008

I'm not ruling anything out. I know from my own experiments with XPath that these technologies can be used to address TWiki DBs. I also know - from the feedback received during the design of the query language - that the resistance to such syntax is very high among TWiki traditionalists.

-- CrawfordCurrie - 06 Jan 2008

Kool. So let's experiment.

-- MichaelDaum - 06 Jan 2008

Potentially, if we had better ways of accessing and addressing bits of our topics - IE. a more advanced TOM - we could come up with some sort of uniform markup to accomplish the following needs:

-- PaulHarvey - 06 Nov 2009
 
Topic revision: r4 - 06 Nov 2009, PaulHarvey
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy