Content Access Syntax


TopicObjectModel, AutomaticLinkLabelBasedOnHeading, SearchingSectionsOfTheTopics, NamedIncludeSections, FormattedTWikiFormDataInTopicText, ReinventingTWiki

The above links contain much of the background that has driven this topic; the contributors to those topics are hereby acknowledged and thanked. Of especial note is RaymondLutz in TopicObjectModel.

All of the above topics discuss or allude to the idea that we should rationalise the way content is stored and accessed, both from a user aspect and from a code aspect.

This topic is focused on the user aspect, and discusses the syntax that might be used to reference meta-data fields. Such a syntax doesn't require big changes to the underlying TWiki engine to be implemented, though implementation is discussed in ContentAccessImplementation.

Meta-data fields are fields within %META tags in the topic, and information from the version control system. So the following needs to be recovered:
  • attachments
    • for each attachment
    • name
    • attr
    • comment
    • path
    • size
    • user
    • version
      • rev
      • date
      • comment (empty at the moment)
      • user
  • parent
  • topicinfo
    • author
    • date
    • format
    • version
  • topicmoved
    • by
    • date
    • from
    • to
  • form
    • for each field in form
    • name
    • value
  • version
    • rev
    • date
    • comment (empty at the moment)
    • user
  • text (raw text of the topic)
    • heading
      • for each heading in text
      • level
      • text
    • table
      • for each table in text
      • row
        • for each row in table
        • cell
    • list
      • for each bullet list in text
      • text

PeterThoeny has suggested that the first heading in a topic is also meta-information, in that it is useful to present as the link to a topic (see AutomaticLinkLabelBasedOnHeading). This can be added quite cleanly:

  • firstHeading
Of course, most of these fields can currently be accessed using %TAG style tags, though the syntax and semantics are somewhat variable. There are two good end-user reasons for rationalising these into a single consistent syntax:
  1. Consistency. A single consistent syntax is easier for users to remember.
  2. Clarity. %VARIABLE syntax has often been criticised as hard to read; a single consistent syntax can be much easier on the eye.
  3. Future proofing. As shown by the adding of firstHeading above, new fields can easily be incorporated into a consistent syntax without requiring the constant invention of new syntaxes.
(another good reason is performance, but this topic is about syntax so we won't say any more about that).

Note that none of the following proposes the instant removal of existing syntaxes for field accesses. Instead, their gradual deprecation and phasing out can take place over time.

Proposal 1

ALERT! Note: Proposal 1 is not an option anymore because of the web hierarchy feature.

Note that the field structure described above is the schema for a simple database - the twiki database. So, a syntax that is reminiscent of standard database access methods is strongly indicated. We already have the roots of this syntax in the Web/Topic relationship; a topic is a field in a web, and Web.Topic is its specifier. The DBCacheContrib already recognises and implements this kind of field access, so the first proposal is that we use the DBCacheContrib C/C++/Perl/Java based syntax:
  • TopicName.version.rev

  • [[TopicName][TopicName.firstHeading]]
Of course this syntax presents a number of problems to the parser, but that's a technical problem that can always be solved.

An issue with any syntax is the representation of arrays. The DBCacheContrib solves this problem using square brackets:
  • Web.TopicName.attachments[0]
DBCacheContrib also offers some powerful extension syntax for extracting subfields, such as:
  • Web.TopicName.attachments[*name]
which would return a space-delimited list of attachments. To handle arrays successfully you also need to embed searches, such as:
  • Web.TopicName.attachments[?size>5000][*name]
which would return a space-delimited list of attachments of size > 5000. And
  • Web.TopicName.attachments[?size>5000][?name=~'*.gif'][*name]
would return a space-delimited list of gif files larger than 5K.

All bracketing in DBCacheContrib is done using square brackets.

While DBCacheContrib currently flattens out lists using space-separation, it would be interesting to consider how formatting might be introduced to mirror the current "standard" for header= and format= parameters to tags.

Proposal 2

A second proposal makes it easier for the parser and also underlines the difference between a topic and a field within a topic. It also uses the DBCacheContrib syntax for field accesses. This is a simple modification of the previous syntax to use a : to separate topic and fields.
  • TopicName:version.rev
  • [[TopicName][TopicName:firstHeading]]
  • Web.TopicName.attachments[*name]=[?name=~'*.gif'][*name]
  • this:attachments[0].name

Proposal 3

The third proposal introduces a curly-brace in place of the ., making it even easier for the parser, in a syntax highly reminiscent of perl hashes
  • Web.TopicName{topicmoved}{by}
  • TopicName{version}{rev}
  • [[TopicName][TopicName{firstHeading}]]
  • Web.TopicName{attachments}[?size>5000][?name=~'*.gif'][*name]
  • this{attachments}[0]{name}

This is perhaps the most technically appealing syntax, because it is easy to parse. From a user perspective, it is also easy to type. The main criticism of it might be that it is not a logical progression from the "." syntax used to access topics in webs, though that could easily be remedied by allowing Web{TopicName} as an alternative to Web.TopicName.

Proposal 4

The fourth proposal is to retain the %VARIABLE syntax, probably with some re-use of the "." subfield syntax
  • %FIELD{web="Web" topic="TopicName" field=""}%
  • %FIELD{topic="TopicName" field="version.rev"}%
  • [[TopicName][%FIELD{topic="TopicName" field="firstHeading"}%]]
  • %FIELD{web="Web" topic="TopicName" field="attachments[?size>5000][?name=~'*.gif'][*name]"}%

while this is the most consistent with the existing syntaxes, IMHO it is the least user-friendly. - CC

Proposal 5

This proposal is similar to proposal 2, with a TOM: (Topic Object Model) prefix. This makes it easier to parse by eye and machine. It is however more typing.
  • TOM:TopicName:version.rev
  • [[TopicName][TOM:TopicName:firstHeading]]
  • TOM:this:attachments[0].name

Proposal 6

The {{where:what}} syntax is easy to parse by eye and machine.
  • {{}}
  • {{TopicName:version.rev}}
  • [[TopicName][{{TopicName:firstHeading}}]]
  • {{this:attachments[0].name}}

Questions to be answered

Here are some questions that need answering:
  1. Would it be better to go with a standard syntax, such as CDML?
  2. How do you represent formatting. For example, I want a table of attachment names and sizes. How about this:
    • TopicName{attachments}[*|$name|$size] (there's got to be something better than that)
    • or %FORMATARRAY{"TopicName{attachments}" header="|*Name*|*Size*|" format="|$name|$size|"}%
    • or
  1. How do we recover revisions? Is there an argument for a full object-oriented approach? PaulineCheung suggests:
    • Web.TopicName::parent::rev{1.2}::wikiusername
    • this could be extended by plugins providing new methods...
  2. How long do we have to keep the old syntax around?
Contributors: CrawfordCurrie, PeterThoeny

Another idea is to consider the sections of the topic text either side of INCLUDE statements as separate. The text then consists of an array of text blocks interleaved with references to other topics. This would required the concept of "parameterised references" to support paramterised includes (currently only supported by MacrosPlugin).

-- CrawfordCurrie - 05 Aug 2004

I would like to get a major feature into DakarRelease that makes an impact. This one is a good candidate.

-- PeterThoeny - 10 Jun 2005

We need a syntax for AttachmentRenderings too.

-- MartinCleaver - 18 Jul 2005

I'd suggest that as DakarRelease's focus is performance, and refactoring to enable future releases, this is highly innapropriate at this late stage in development. We have already locked down development to bug fixes and docco only.

As this is a very critical feature to get right, and Crawford and I (both of us really want it) never quite see eye to eye on some of the details, i'd suggest that it will take quite a number of months of work (and design) to make it releaseable.

-- SvenDowideit - 19 Jul 2005

I would like to ImplementIdeaBeforeTheCompetition, but this is now a CompetitiveNecessity (see WikiTalk).

-- PeterThoeny - 07 Nov 2005

See also BringTopicVarsIntoCore.

-- ArthurClemens - 07 Nov 2005

A solid access syntax should be defined and implemented in EdinburghRelease. This helps avoid introducing many VARIABLES (see AttachmentCount as one example.)

-- PeterThoeny - 27 Feb 2006

Is that the TopicObjectModel and/or a DocumentObjectModel (DOM) ?

-- WillNorris - 27 Feb 2006

Ideally the TopicObjectModel and the ContentAccessSyntax should be done at the same time. It can be done in phases as well. A logical approach:
  1. Define and implement an underlying TopicObjectModel
  2. Define and implement the ContentAccessSyntax

Alternatively, we could turn this around:
  1. Define the ContentAccessSyntax and implement a subset of it based on the existing architecture (low hanging fruit)
  2. Define an underlying TopicObjectModel
  3. Finish the ContentAccessSyntax implementation, this time based on the TopicObjectModel

-- PeterThoeny - 27 Feb 2006

Actually, I think it is wise to keep them separate. The TOM model employed by code has to be compromised to some extend for efficiency, but the user should only see the "pure" form of the content access sytax; it should look consistent and simple to the user, irrespective of what horrors lurk beneath. ;-). The TopicObjectModel would actually be easier to define if the requirements imposed by the ContentAccessSyntax were known in advance.

On Will's point; the DOM is indeed extremely similar, though the access syntax to the DOM is somewhat arcane. I would prefer something a lot more user-friendly.

Note that proposal 1 has been blown away by subwebs, which have "stolen" the .

-- CrawfordCurrie - 27 Feb 2006

Picking up on the idea again. See updated proposals.

-- PeterThoeny - 08 Jul 2006

Here is another idea: ModifyContentAccessSyntax.

-- PeterThoeny - 08 Jul 2006

This is also needed for specifying that templates are held as attachments - see TemplatePathIsCounterintuitive

-- MartinCleaver - 08 Jul 2006

Food for thought how to approach this: The goal is to strengthen the TWikiApplication platform by creating a powerful and intuitive server side DOM for content access (this topic) and content change (ModifyContentAccessSyntax). On top of that, add an ApplicationFramework to create applications easily. TWiki a better PHP? Borrow ideas from RubyOnRails?

-- PeterThoeny - 16 Nov 2006

We can, of course, go and define all kinds of fancy extensions.

As far as I am concerned, the DBCacheContrib and FormQueryPlugin (in their "YetAnother..." incarnation) go a long way of addressing my application needs, except that I would like to have a join query, which I will sooner or later get around to do.

In certain applications I am accessing all the topics that way only.

-- ThomasWeigert - 11 Jan 2007

I'm opening this up again because of recent discusssions relating to REST and indexing.

-- CrawfordCurrie - 09 Mar 2009

Added some discussion in TopicObjectModel.

-- PaulHarvey - 06 Nov 2009

Discarded, as it's well past it's sell-by date.

-- Main.CrawfordCurrie - 18 Jan 2016 - 18:27
Topic revision: r6 - 18 Jan 2016, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy