Regular expressions
I'm not quite sure what you are looking for, but I believe there are two general approaches that can be used:
- If it's just one character you need to check for, as I think you are trying to do in this case (check for /), then you can explicitly have a check for a leading character that is not / or any other special character. (Note if you also needed to match zero characters, this would not work, but in this case I believe it is acceptable to not insert $docroot into an empty value for the src attribute.)
- You can use negative lookahead:
(?!pattern_to_not_match)
So taking your simple example of trying to replace <img src="somefile.png"> with <img src="$docroot/somefile.png"> but not touch <img src="/somefile.png">, you could use a regexp like either of the following:
$newString =~ s{
(<[iI][mM][gG]
\s+
(?:[^>]+\s+)*
[sS][rR][cC] \s*=\s*) # $1
(['"]) # $2
([^/\2]
[^\2]*) # $3
(\2
[^>]*>
) # $4
}{$1$2$docroot$3$4}gx;
$newString =~ s{
(<[iI][mM][gG]
\s+
(?:[^>]+\s+)*
[sS][rR][cC] \s*=\s*) # $1
(['"]) # $2
((?!/)
[^\2]+) # $3
(\2
[^>]*>
) # $4
}{$1$2$docroot$3$4}gx;
--
IsaacLin - 03 Dec 2008 - 22:54
I used the regexps above as examples in a tutorial I am creating, and realized there are some cases they won't cover, such as
src= appearing within the value of another attribute. Here is a more complete solution:
Edited on 20 Feb 2009: changed how attribute values are matched, as backreferences within a character class does not seem to work. The example below only works with src attributes that have a double-quoted value; either repeat the regexp for single-quoted values (and possibly values without quotes), or add the /e modifier and have a small fragment of code to extract the exact value and then prepend $docroot.
my $reSqString = qr{
\'
[^\']*
\'
}x;
my $reDqString = qr{
\"
[^\"]*
\"
}x;
my $reAttrValue = qr{
(?: $reSqString | $reDqString | [^\'\"\s]+ )
}x;
$newString =~ s{
(<[iI][mM][gG] # $1 start
\s+
(?: \w+ \s*=\s* $reAttrValue \s+ )*
[sS][rR][cC] \s*=\s*) # $1 end
(?:\"([^\"]+)\") # $2
( (?: \s+ \w+ \s*=\s* $reAttrValue )* #$3
\s*/?>
) # $3 end
}{$1"$docroot$2"$3}gx;
It assumes that attribute names always match \w+, and that the values are properly enclosed in quotes.
-- IsaacLin - 04 Feb 2009