I’ve been using MySQL for the better part of the past 6 years so it comes as a pleasant surprise when I found out last month that it can do regular expressions within its SQL!
I’m not sure about the other databases, the last time I’ve used Oracle was around 2004-2005 during my college days, so I can’t really say if this is a MySQL-exclusive feature or something along those lines. I’m pretty positive that this isn’t standard SQL though, since we never had such lessons back in college.
Anyway, if you’re familiar with SQL and string matching within SQL, it makes use of the LIKE = '[string]' clause. Where [string] is the string to be searched for. Optionally, you could also use wildcards like _ (underscore), % (percent), [charlist] (character list) or [^charlist] (negated character list) or a combination of those to match string fragments. Unfortunately this has also been one of the major points of entry for hacking/hijacking a database-driven website, via SQL injection. Due to this vulnerability, I’ve either been using heavy data validation, or store encoded data, or at times, avoiding this altogether… but I digress.
Using regex in my SQL queries is a godsend. It helps reduce my data processing and validation overhead.
MySQL supports the use of almost all POSIX regex metacharacters via the REGEXP or RLIKE clause (or a negation using NOT REGEXP or NOT RLIKE).
The following list describes some characteristics of extended regular expressions:
- “.” matches any single character.
- A character class “[...]” matches any character within the brackets. For example, “[abc]” matches “a”, “b”, or “c”. To name a range of characters, use a dash. “[a-z]” matches any letter, whereas “[0-9]” matches any digit.
- “*” matches zero or more instances of the thing preceding it. For example, “x*” matches any number of “x” characters, “[0-9]*” matches any number of digits, and “.*” matches any number of anything.
- A REGEXP pattern match succeeds if the pattern matches anywhere in the value being tested. (This differs from a LIKE pattern match, which succeeds only if the pattern matches the entire value.)
- To anchor a pattern so that it must match the beginning or end of the value being tested, use “^” at the beginning or “$” at the end of the pattern.
Let’s start with the traditional SELECT ... LIKE query. Here’s a query that selects all animals whose name begins with “ant”
SELECT name
FROM animals
WHERE name LIKE 'ant%'
The results could be something like this:
name
---------
ant
anteater
antelope
...
A regex version of that would be:
SELECT name
FROM animals
WHERE name REGEXP '^ant'
So far they look similar… but, say, what if you wanted to use a complex rule like: “select all animals whose names starts with either ‘a’ or ‘c’ and ends with either ‘t’ or ‘p’. It would look messy if you do it this way:
SELECT name
FROM animals
WHERE name LIKE 'a%t'
OR name LIKE 'a%p'
OR name LIKE 'c%t'
OR name LIKE 'c%p'
However with regex, it is as simple as this:
SELECT name
FROM animals
WHERE name REGEXP '^[ac].*[tp]$'
Imagine if your filtering conditions were much more complex. It’s not hard to see how regexp can help with that!