Item14612: Exclude unpublished topics from URL processing

pencil
Priority: Enhancement
Current State: Closed
Released In: n/a
Target Release:
Applies To: Extension
Component: PublishPlugin
Branches: master
Reported By: CrawfordCurrie
Waiting For:
Last Change By: CrawfordCurrie
I have a requirement to maintain links to unpublished topics back to the originating wiki.

Add an option to not process URLs that refer to external topics.

Proposed patch:
589c589
<             $this->_publishTopic( split( /\./, $wt, 2 ) );
---
>             $this->_publishTopic( split( /\./, $wt, 2 ), @topics );
703c703
<     my ( $this, $web, $topic ) = @_;
---
>     my ( $this, $web, $topic, @topics ) = @_;
893c893
<       s/<a [^>]*\bhref=[^>]*>/$this->_rewriteTag($&, 'href', $web, $topic)/geis;
---
>       s/<a [^>]*\bhref=[^>]*>/$this->_rewriteTag($&, 'href', $web, $topic, @topics)/geis;
895c895
< s/<link [^>]*\bhref=[^>]*>/$this->_rewriteTag($&, 'href', $web, $topic)/geis;
---
>       s/<link [^>]*\bhref=[^>]*>/$this->_rewriteTag($&, 'href', $web, $topic, @topics)/geis;
897c897
<       s/<img [^>]*\bsrc=[^>]*>/$this->_rewriteTag($&, 'src', $web, $topic)/geis;
---
>       s/<img [^>]*\bsrc=[^>]*>/$this->_rewriteTag($&, 'src', $web, $topic, @topics)/geis;
899c899
< s/<script [^>]*\bsrc=[^>]*>/$this->_rewriteTag($&, 'src', $web, $topic)/geis;
---
>       s/<script [^>]*\bsrc=[^>]*>/$this->_rewriteTag($&, 'src', $web, $topic, @topics)/geis;
901c901
< s/<blockquote [^]*\bcite=[^>]*>/$this->_rewriteTag($&, 'cite', $web, $topic)/geis;
---
>       s/<blockquote [^]*\bcite=[^>]*>/$this->_rewriteTag($&, 'cite', $web, $topic, @topics)/geis;
903c903
<       s/<q [^>]*\bcite=[^>]*>/$this->_rewriteTag($&, 'cite', $web, $topic)/gei;
---
>       s/<q [^>]*\bcite=[^>]*>/$this->_rewriteTag($&, 'cite', $web, $topic, @topics)/gei;
940c940
<     my ( $this, $tag, $key, $web, $topic ) = @_;
---
>     my ( $this, $tag, $key, $web, $topic, @topics ) = @_;
950c950
<     my $new = $this->_processURL( $attrs{$key} );
---
>     my $new = $this->_processURL( $attrs{$key}, @topics );
973c973
<     my ( $this, $url ) = @_;
---
>     my ( $this, $url, @topics ) = @_;
1159,1161c1159,1161
<         # for the template being generated. We do this even if the
<         # topic isn't included in the processed outout, so we may
<         # end up with broken links. C'est la guerre.
---
>         # for the template being generated. We do this only if the
>         # topic is included in the export topic list. Otherwise, the
>         # the link is left as is so that we don't end up with broken links.
1164c1164,1173
<         $new = $this->{archive}->getTopicPath( $web, $topic );
---
>     print STDERR "web: $web \n" if $noisy;
>         my $webtopic = "$web.$topic";
>         if ("@topics" =~ /$webtopic/) {
>            print STDERR "Part of topic list so rewriting path. \n" if $noisy;
>      $new = $this->{archive}->getTopicPath( $web, $topic );
>          } else {
>             print STDERR "Not in topic list so leaving as is. \n" if $noisy;
>             $new = $url;
>          }
>          print STDERR "new: $new \n" if $noisy;

-- LynnwoodBrown - 28 Jan 2018

Nice idea. Note that this only works when the topics list contains all topics. It won't work when incremental publishing is used, when it is implicit that pre-published topics exist in the output and links to them must be resolved. Therefore the idea is fine, but it has to be switched on/off under control of an option to maintain compatibility. The patch is also very inefficient, and in one case dangerous, so cannot be included as-is.

-- Main.CrawfordCurrie - 28 Jan 2018 - 09:25

To expand on this a bit.

Firstly, depending on how FW is configured, it will generate relative links to topics. If you have a link to an unpublished topic that you ignore (i.e. your patch) then you end up with a link in the output that is relative to the publishing root.

Let's say you were using URL rewriting to rewrite a link to %SCRIPTURLPATH%/Sandbox/WebHome. FW would generate <a href='/Sandbox/WebHome'>, which might just work with the file generator, but would break in any other generator. Without link rewriting, it would generate /bin/view/Sandbox/WebHome, which would almost certainly never work (unless you were really unlucky).

Secondly, the plugin supports incremental publishing. For that to work it has to rewrite links to topics that are not being published in this run, but may have been published in a previous run (or may be published in a later run) with a different topic set, but the same target dir (example; you might have two groups that have different publishing schedules, but their work is merged into a single published manual).

Thinking about this further, there are a number of actions that could be followed in the event that a broken internal link (i.e. one that matches the criteria for a FW link) is encountered.
  • follow - the link would be rewritten and the topic referred to in the link could be added to the end of the publish list if it's not already there
  • 404 - the link would be broken in such a way that an accidental hit could never succeed
  • ignore - the link would not be rewritten, like any other internal link (this is what you want)
  • rewrite - the link would be rewritten, like any other internal link (this would be the default)

-- Main.CrawfordCurrie - 30 Jan 2018 - 08:33

 
Topic revision: r4 - 30 Jan 2018, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy