EugenMayerTalk

Regular expressions

I'm not quite sure what you are looking for, but I believe there are two general approaches that can be used:
  • If it's just one character you need to check for, as I think you are trying to do in this case (check for /), then you can explicitly have a check for a leading character that is not / or any other special character. (Note if you also needed to match zero characters, this would not work, but in this case I believe it is acceptable to not insert $docroot into an empty value for the src attribute.)
  • You can use negative lookahead: (?!pattern_to_not_match)

So taking your simple example of trying to replace <img src="somefile.png"> with <img src="$docroot/somefile.png"> but not touch <img src="/somefile.png">, you could use a regexp like either of the following:
  $newString =~ s{
     (<[iI][mM][gG]
        \s+
        (?:[^>]+\s+)*
        [sS][rR][cC] \s*=\s*)   # $1
        (['"])                  # $2
        ([^/\2]
         [^\2]*)                # $3
        (\2
          [^>]*>
        )                       # $4
  }{$1$2$docroot$3$4}gx;

  $newString =~ s{
     (<[iI][mM][gG]
        \s+
        (?:[^>]+\s+)*
        [sS][rR][cC] \s*=\s*)   # $1
        (['"])                  # $2
        ((?!/)
         [^\2]+)                # $3
        (\2
          [^>]*>
        )                       # $4
  }{$1$2$docroot$3$4}gx;
-- IsaacLin - 03 Dec 2008 - 22:54

I used the regexps above as examples in a tutorial I am creating, and realized there are some cases they won't cover, such as src= appearing within the value of another attribute. Here is a more complete solution:

Edited on 20 Feb 2009: changed how attribute values are matched, as backreferences within a character class does not seem to work. The example below only works with src attributes that have a double-quoted value; either repeat the regexp for single-quoted values (and possibly values without quotes), or add the /e modifier and have a small fragment of code to extract the exact value and then prepend $docroot.

my $reSqString = qr{
  \'
  [^\']*
  \'
}x;

my $reDqString = qr{
  \"
  [^\"]*
  \"
}x;

my $reAttrValue = qr{
  (?: $reSqString | $reDqString | [^\'\"\s]+ )
}x;

$newString =~ s{
    (<[iI][mM][gG]             # $1 start
       \s+
       (?: \w+ \s*=\s* $reAttrValue \s+ )*
       [sS][rR][cC] \s*=\s*)   # $1 end
      (?:\"([^\"]+)\")  # $2
      ( (?: \s+ \w+ \s*=\s* $reAttrValue )* #$3
        \s*/?>
      )                       # $3 end
}{$1"$docroot$2"$3}gx;

It assumes that attribute names always match \w+, and that the values are properly enclosed in quotes.

-- IsaacLin - 04 Feb 2009
Topic revision: r3 - 20 Feb 2009, IsaacLin
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy