Mends.One

a simple string substitution does not work

Perl, String, Substitution

Below is my codes:

my $string1 = '<td><a href="http://www.aaa.com/downloads/details.aspx?FamilyID=a1b2c3">abcdefg</a><br />(123456)</td>';
my $string2 = 'http://www.aaa.com/downloads/details.aspx?FamilyID=a1b2c3';


print "Before string substitution:\n$string1\n";
$string1 =~ s/$string2//;
print "After string substitution:\n$string1\n"; 

And the actual output:

Before string substitution:
<td><a href="http://www.aaa.com/downloads/details.aspx?FamilyID=a1b2c3">abcdefg</a><br />(123456)</td>
After string substitution:
<td><a href="http://www.aaa.com/downloads/details.aspx?FamilyID=a1b2c3">abcdefg</a><br />(123456)</td> 

What I expect:

Before string substitution:
<td><a href="http://www.aaa.com/downloads/details.aspx?FamilyID=a1b2c3">abcdefg</a><br />(123456)</td>
After string substitution:
<td><a href="">abcdefg</a><br />(123456)</td> 

could someone please tell me what is wrong in my code?

Thanks.

-1
C
Charlie Yen
Jump to: Answer 1 Answer 2

Answers (2)

That problem can be fixed by adding two characters to your script. What you need is to escape meta characters in $string2:

$string1 =~ s/\Q$string2//;

The character that causes the match to fail is the question mark ?, which unescaped here ...aspx?... means "match 0 or 1 of the character 'x'". The characters . are wildcards that match anything except newline, which may cause false positive matches. The slashes /, while being meta characters due to being the delimiter of the substitution operator s///, do not need to be escaped since they are embedded in a string.

Escaping meta characters is most easily done with the \Q ... \E escape sequence, inside a regex, or with quotemeta.

It is not a good idea to try and escape these kinds of strings manually, especially if literal matches are all that is required.

2
T
TLP

Comments:

Brad Gilbert said:
? is the only character that prevents it from matching. The two . could also become troublesome.
TLP said:
@BradGilbert I am not sure what your message is with this comment.
Brad Gilbert said:
If you only fixed the ?. It could still match http://www_aaa.com/downloads/details.aspx?FamilyID=a1b2c3 (swap the first . for _). Which would make the first . a problem. ( The comment was mostly for future viewers of this answer )
TLP said:
@brad The solution I presented does not present a problem with meta characters.
Brad Gilbert said:
I know it doesn't. If someone who had a similar problem just backslashed the ?, they could still have a problem. One which is far harder to find. I figured there should be something that pointed out that using quotemeta and \Q ... \E would fix other problems that weren't yet apparent.

Since you are putting in characters that are considered special characters by perl regex, you must escape them out like this:

my $string2 = 'http:\/\/www\.aaa\.com\/downloads\/details\.aspx\?FamilyID=a1b2c3';

Then the expected output will show up when you run your program:

<td><a href="http://www.aaa.com/downloads/details.aspx?FamilyID=a1b2c3">abcdefg</a><br />(123456)</td>
After string substitution:
<td><a href="">abcdefg</a><br />(123456)</td>

To escape these characters from your string, it is best to just use perl's quotemeta function:

my $string2 = quotemeta('http://www.aaa.com/downloads/details.aspx?FamilyID=a1b2c3');

This will escape the special characters for you and then your regex replace will work fine.

EDIT

Since you're having issues because of non-escaped regex characters, this solution might be simpler since it does not require you to escape any characters:

substr($string1, index($string1,$string2), length($string2)) = '';

This is based off of this example:

my $name = 'fred';
substr($name, 4) = 'dy'; # $name is now 'freddy'

found in the perldocs for substr.

1
S
srchulo

Comments:

Jonathan Leffler said:
You're definitely on the right track; the primary trouble-maker is the ?; the slashes actually aren't a problem in this context (put a backslash in front of the ? only, and try it). The . characters will match the . quite happily (as well as anything else). De facto, it is unlikely that a string will cause problems because of the dots.
Brad Gilbert said:
Your substr example should almost be {my $index=index($string1,$string2);if($index >= $[){substr($string1,$index,length($string2),'')}}. What you have now appends the empty string to the end of $string1 if it doesn't match. Right now it isn't much of a problem, but in the future it could cause an unnecessary copy when COW strings become the default.

Related Questions