Escaping regular expressions in PHP

April 10, 2010 by aaron
Escaping regular expressions in PHP

Escaping dynamic regex strings automatically in PHP is a lot harder than you would think. You can’t just use a string like"\$myregex" because PHP will try to escape the $. You can’t even double slash it like \\$myregex because this doesn’t work in the regex engine. To get both the PHP and the regular expressions to work correctly together, you have to combine quadruple slashes with stripslashes.

The Code

$escapes = array('.','$','^','[',']','?','+','(',')','*','|','\\');
foreach($escapes as $s){
    $r = "\\\\$s";
    $myregexbase = str_replace($s, stripslashes($r), $myregexbase);
}

It’s usage

So what sort of time would you need this script you ask? Suppose, for example, you have a dynamic list of words that need to be replaced in a block of text. Now, you don’t want to replace partial matches (“cat” should not be replaced in the word “catch”), so you need to merge the list of $words with the compete regex.

Why can’t you use a single slash?

The problem is that if you use a single slash it will escape the quote is it wrapped in the problem is with the line $r = "\\\\$s". If you try to use single quotes around the string, $s is not parsed correctly. If you use $r = '\\\\'.$s; it works fine, but there is no real benefit.

Why the stripslashes? Why not just use $r = '\\'.$s;? Again, it’s a problem with a collision between the generated regular expression and PHP. When you replace the character, the \ will be taken as a literal \ rather than an escaped ``, so the outputs in the regular expression will interpret the backslash escaped rather than the character that was supposed to be escaped.

Why not use a different hack?

Sure you could use $r = trim('\ ').$s;, but it still just a hack, and it is even less understandable. The one benefit of using \\\ with stripslashes is that it is easy to understand what is happening. Extra backslashes have been added. They are going to be stripped away. While the actual output might not be entirely clear, the solution is understandable in that the extra slashes are being removed. Overall, the problem is kinda crazy, and the solution is even crazier and makes no sense but it works.

comments powered by Disqus

Do you want to get in touch?

Let us know what you want to create.

Contact Us