Regular Expression Fun
Sep 18, 2009 Posted in PHP, Programming

There are already loads of resources on the interwebs explaining the purpose of regular expressions so no need to explain further here. (Well basically, they allow you to match and extract patterns of text from strings). I just want to document a few quick examples that I use a lot.

Finding Links

The Regular Expression:

preg_match_all("/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU", $html, $matches)

How I Use it:

Say we allowed people to post links in a comment form. We could use the reg exp to extract links not in our safe list, and make sure they always open in another tab instead of taking you away from our site.

 $safe_urls = array("ignitedusa.com","ignitedla.com","yetiusa.com");
 $regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>"; 
 if(preg_match_all("/$regexp/siU", $html, $matches)) { /* $matches[2] = array of link addresses # $matches[3] = array of link text - including HTML code*/
    foreach($matches[2] as $k => $link){
        
        if (str_replace($safe_urls,"",$link) == $link){
            if (str_replace("mailto","",$link) == $link){
                $html = str_replace('<a href="' . $link . '"', '<a href="' . $link . '" rel="shadowbox[]" title="' . $matches[3][$k] . '" target="_blank"', $html);
            }
        }
    }
  }


Finding Images

The Regular Expression:

preg_match_all('/<img[^>]*>/im', $html, $imgTags);

How I Use It:

CMS users can't really be trusted to size their images sensibly for the web. They have infact been known to upload large files and just size in the rich text editor, meaning site viewers have to experience longer download times than necessary. Instead of banging one's head against the wall trying to teach them how to size things in an image editor and save for web, you can use the magic of reg expressions to pull out the images and replace with a link to a thumbnailer (I use phpThumb) that will size the image properly and cache it.


function useCachedImages($html){  

	  global $CFG;
	  preg_match_all('/<img[^>]*>/im', $html, $imgTags); // get all images within the page
	  $img_tags = $imgTags[0];
	
	  //loop over the matches
	  foreach($img_tags as $img_tag)
	  {
		 preg_match('/src="([^"]*)"/', $img_tag, $src_match);   //gets the src string out of the img tag
		 $src = $src_match[1]; 
		 preg_match('/width="([0-9]+)"/', $img_tag, $w_match);  //height
		 $width = $w_match[1];
		 preg_match('/height="([0-9]+)"/', $img_tag, $h_match);  //width
		 $height= $h_match[1]; 
		 
g = "<img src=\"" . $CFG->blog_url . "/libs/phpThumb/phpThumb.php?src=$src&w=$width&h=$height\">";
		//  echo $new_tag; exit;
		 $new_tags[] = $new_tag;
	  }
	  $ret = str_replace($img_tags,$new_tags,$html);
	  $ret = str_replace("<p class=\"MsoNormal\"><o:p> </o:p></p>","",$ret);  //Get rid of some annoying Microsoft crap too, for when they forget to use "Paste from Word"
	  return $ret;
}
blog comments powered by Disqus

Categories