<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Chuck&#039;s Musings &#187; regular expressions</title>
	<atom:link href="http://blog.chuckcerrillo.com/tag/regular-expressions/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.chuckcerrillo.com</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Sun, 21 Feb 2010 18:43:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Regular expressions in MySQL</title>
		<link>http://blog.chuckcerrillo.com/2009/09/regular-expressions-in-mysql/</link>
		<comments>http://blog.chuckcerrillo.com/2009/09/regular-expressions-in-mysql/#comments</comments>
		<pubDate>Tue, 01 Sep 2009 05:19:32 +0000</pubDate>
		<dc:creator>Chuck</dc:creator>
				<category><![CDATA[PHP and MySQL]]></category>
		<category><![CDATA[chuck cerrillo]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[query]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[regexp]]></category>
		<category><![CDATA[regular expressions]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://blog.chuckcerrillo.com/?p=92</guid>
		<description><![CDATA[Using regular expressions in your SQL statements helps to keep it simple.]]></description>
			<content:encoded><![CDATA[<p><a href="http://mysql.com"><img style="float: left" src="http://dev.mysql.com/common/logos/logo_mysql_sun_a.gif" alt="MySQL" /></a> I&#8217;ve been using MySQL for the better part of the past 6 years so it comes as a pleasant surprise when I found out last month that it can do regular expressions within its SQL!</p>
<p>I&#8217;m not sure about the other databases, the last time I&#8217;ve used Oracle was around 2004-2005 during my college days, so I can&#8217;t really say if this is a MySQL-exclusive feature or something along those lines. I&#8217;m pretty positive that this isn&#8217;t standard SQL though, since we never had such lessons back in college.</p>
<p>Anyway, if you&#8217;re familiar with SQL and string matching within SQL, it makes use of the <code class="codecolorer sql default"><span class="sql"><span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #66cc66;">=</span> <span style="color: #ff0000;">'[string]'</span></span></code> clause. Where [string] is the string to be searched for. Optionally, you could also use wildcards like _ (underscore), % (percent), [charlist] (character list) or [^charlist] (negated character list) or a combination of those to match string fragments. Unfortunately this has also been one of the major points of entry for hacking/hijacking a database-driven website, via <a href="http://en.wikipedia.org/wiki/SQL_injection">SQL injection</a>. Due to this vulnerability, I&#8217;ve either been using heavy data validation, or store encoded data, or at times, avoiding this altogether&#8230; but I digress.</p>
<p>Using regex in my SQL queries is a godsend. It helps reduce my data processing and validation overhead.</p>
<p>MySQL supports the use of almost all POSIX regex  metacharacters via the <code class="codecolorer sql default"><span class="sql"><span style="color: #993333; font-weight: bold;">REGEXP</span></span></code> or <code class="codecolorer sql default"><span class="sql"><span style="color: #993333; font-weight: bold;">RLIKE</span></span></code> clause (or a negation using <code class="codecolorer sql default"><span class="sql"><span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">REGEXP</span></span></code> or <code class="codecolorer sql default"><span class="sql"><span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">RLIKE</span></span></code>).</p>
<p><strong>The following list describes some characteristics of extended regular expressions:</strong></p>
<ul>
<li>“.” matches any single character.</li>
<li>A character class “[...]” matches any character within the brackets. For example, “[abc]” matches “a”, “b”, or “c”. To name a range of characters, use a dash. “[a-z]” matches any letter, whereas “[0-9]” matches any digit.</li>
<li>“*” matches zero or more instances of the thing preceding it. For example, “x*” matches any number of “x” characters, “[0-9]*” matches any number of digits, and “.*” matches any number of anything.</li>
<li>A REGEXP pattern match succeeds if the pattern matches anywhere in the value being tested. (This differs from a LIKE pattern match, which succeeds only if the pattern matches the entire value.)</li>
<li>To anchor a pattern so that it must match the beginning or end of the value being tested, use “^” at the beginning or “$” at the end of the pattern.</li>
</ul>
<p>Let&#8217;s start with the traditional <code class="codecolorer sql default"><span class="sql"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">...</span>  <span style="color: #993333; font-weight: bold;">LIKE</span></span></code> query. Here&#8217;s a query that selects all animals whose name begins with &#8220;ant&#8221;</p>
<div class="codecolorer-container sql default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;"><div class="sql codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #993333; font-weight: bold;">SELECT</span> name<br />
<br />
<span style="color: #993333; font-weight: bold;">FROM</span> animals<br />
<br />
<span style="color: #993333; font-weight: bold;">WHERE</span> name <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">'ant%'</span></div></div>
<p>The results could be something like this:</p>
<div class="codecolorer-container sql default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;"><div class="sql codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">name<br />
<br />
<span style="color: #808080; font-style: italic;">---------</span><br />
<br />
ant<br />
<br />
anteater<br />
<br />
antelope<br />
<br />
<span style="color: #66cc66;">...</span></div></div>
<p>A regex version of that would be:</p>
<div class="codecolorer-container sql default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;"><div class="sql codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #993333; font-weight: bold;">SELECT</span> name<br />
<br />
<span style="color: #993333; font-weight: bold;">FROM</span> animals<br />
<br />
<span style="color: #993333; font-weight: bold;">WHERE</span> name <span style="color: #993333; font-weight: bold;">REGEXP</span> <span style="color: #ff0000;">'^ant'</span></div></div>
<p>So far they look similar&#8230; but, say, what if you wanted to use a complex rule like: &#8220;select all animals whose names starts with either &#8216;a&#8217; or &#8216;c&#8217; and ends with either &#8216;t&#8217; or &#8216;p&#8217;.  It would look messy if you do it this way:</p>
<div class="codecolorer-container sql default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;"><div class="sql codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #993333; font-weight: bold;">SELECT</span> name<br />
<br />
<span style="color: #993333; font-weight: bold;">FROM</span> animals<br />
<br />
<span style="color: #993333; font-weight: bold;">WHERE</span> name <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">'a%t'</span><br />
<br />
<span style="color: #993333; font-weight: bold;">OR</span> name <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">'a%p'</span><br />
<br />
<span style="color: #993333; font-weight: bold;">OR</span> name <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">'c%t'</span><br />
<br />
<span style="color: #993333; font-weight: bold;">OR</span> name <span style="color: #993333; font-weight: bold;">LIKE</span> <span style="color: #ff0000;">'c%p'</span></div></div>
<p>However with regex, it is as simple as this:</p>
<div class="codecolorer-container sql default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;"><div class="sql codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #993333; font-weight: bold;">SELECT</span> name<br />
<br />
<span style="color: #993333; font-weight: bold;">FROM</span> animals<br />
<br />
<span style="color: #993333; font-weight: bold;">WHERE</span> name <span style="color: #993333; font-weight: bold;">REGEXP</span> <span style="color: #ff0000;">'^[ac].*[tp]$'</span></div></div>
<p>Imagine if your filtering conditions were much more complex. It&#8217;s not hard to see how regexp can help with that!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chuckcerrillo.com/2009/09/regular-expressions-in-mysql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Regex saves the day, again</title>
		<link>http://blog.chuckcerrillo.com/2009/08/regex-saves-the-day-again/</link>
		<comments>http://blog.chuckcerrillo.com/2009/08/regex-saves-the-day-again/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 22:21:08 +0000</pubDate>
		<dc:creator>Chuck</dc:creator>
				<category><![CDATA[PHP and MySQL]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[regular expressions]]></category>
		<category><![CDATA[search tool]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://blog.chuckcerrillo.com/?p=44</guid>
		<description><![CDATA[Searching is easier with regex... Good thing MySQL supports that in SQL queries too!]]></description>
			<content:encoded><![CDATA[<p>I was supposed to do a &#8220;search&#8221; feature and a &#8220;related items&#8221; feature for the module I was assigned to. This was were my regex skills came in handy yet again.</p>
<p>I made a very barebone prototype of it, which takes some input text for the search query against some stored content, in this case, a full-text article (although the final version should be against a database).</p>
<p>The prototype I made is still very rough around the edges. What it does is that it extracts all keywords from the search query using this expression:</p>
<p><div class="codecolorer-container php default" style="overflow:auto;white-space:nowrap;border: 1px solid #9F9F9F;"><div class="php codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #000088;">$regex</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">'/\b([\p{L}|\p{Ll}|\p{Lu}|_]+?)\b/i'</span><span style="color: #339933;">;</span></div></div>
</p>
<p>That rule can be translated as follows:</p>
<p style="padding-left: 30px;"><em><span style="color: #993366;">take any combination of unicode characters and underscores that are enclosed in word boundaries </span></em></p>
<p>Having done that, I place all matched data into an array and call that my &#8220;keywords&#8221; array, to be used later.</p>
<p>The next thing I did was to chop the article down into smaller pieces, currently by sentence (I only used periods as a delimiter, I should probably include other punctuation marks I guess). Then I ran <em>preg_match()</em> through each piece to quickly check for matches for any of the keywords. These results are then compiled.</p>
<p>In each compiled piece, I assign a corresponding weight. This weight is my arbitrary way of picking the best match. My currently implementation <em>sets weight to be equal to the number of characters in the piece that are matched with keywords</em>. I still have to refine these rules later on.</p>
<p>When all pieces have their weight assigned, I sort them according to weight in descending order (highest to lowest weight). Then I display the results and highlight the matched characters.</p>
<p>Here&#8217;s the prototype I made: <a href="http://thedirtlab.chuckcerrillo.com/php-and-mysql/search.php">search tool</a>.</p>
<p>I&#8217;m still thinking of the refinements I could make so I&#8217;ll just post them here as I go along.</p>
<p>Also, it&#8217;s good to know that MySQL supports regex in your SQL queries, this should save me a lot of time!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.chuckcerrillo.com/2009/08/regex-saves-the-day-again/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
