Archive

Posts Tagged ‘MySQL’

Juggling time zones

September 3rd, 2009 Chuck 11 comments

Handling times for different timezones is one of the usual things I had to face with every project that has in one way or the other involves the element of time.

The usual projects I work on store dates into a datetime field in the database. This means that the date must use the YYYY-MM-DD HH:MM:SS format (e.g. 2009-09-03 07:45:00).

One of the first things you’d want to do is to let the user specify a timezone setting for that user account (assuming this is in a multi-user environment, as most web applications are). You could view all the timezones using this code:

echo '<ul>';
foreach(DateTimeZone::listAbbreviations() AS $timezone_abbreviation=>$timezones){
     foreach($timezones AS $timezone)
          if(isset($timezone['timezone_id']))
          echo '<li>'.$timezone_abbreviation.' - ' . $timezone['timezone_id'] . '
          <ul>
               <li>Uses DST: '
. ($timezone['dst']===TRUE?'Yes':'No') . '</li>
               <li>Offset: '
. $timezone['offset'] . '</li>
          </ul></li>'
;
}
echo '</ul>';

You could modify the code above to create a drop-down list of the different timezones to make it easier for the user to pick a timezone (because simply writing GMT+8 or UTC+8 might be confusing for some people).

Next, we will have to convert all of our stored dates into a single timezone, for simplicity, I use GMT/UTC as a reference timezone (since it is easy to add timezone offsets to it). If you store your dates in a datetime field, you can perform the following conversion after retrieving the timestamp from the database:

// let's assume $parsedtime is the entry we have in the database
// and $offset is the timezone offset we got based on the user's settings
$parsedtime = '2009-09-15 12:00:00';
$offset = 8*60*60; // this is GMT +8
// get the GMT counterpart of the parsedtime
$unix_time = strtotime($parsedtime . ' GMT');
echo gmdate('m d Y h:iA',$unix_time-$offset); // displays: 09 15 2009 04:00AM

So that means, 12nn of September 15, 2009 is 4am of September 15, 2009 in GMT/UTC.

Let’s say we had a Calendar application, and you wanted to set an alarm for 6pm on that same day. You could do the something like the this (assuming you wrote some sort of class for this thing):

$reminder = new Reminder('2009-09-15 18:00:00','Dinner with the family');
echo $reminder->timeLeft();

Since the current date and time was 12nn of September 15, 2009, this means that the above reminder will display “6 hours left”, depending on how the timeLeft() method is formatted.

If we didn’t convert the date into a standard timezone, this will force our functions to use the server’s timezone setting and that might not be what you want.

Regular expressions in MySQL

September 1st, 2009 Chuck No comments

MySQL I’ve been using MySQL for the better part of the past 6 years so it comes as a pleasant surprise when I found out last month that it can do regular expressions within its SQL!

I’m not sure about the other databases, the last time I’ve used Oracle was around 2004-2005 during my college days, so I can’t really say if this is a MySQL-exclusive feature or something along those lines. I’m pretty positive that this isn’t standard SQL though, since we never had such lessons back in college.

Anyway, if you’re familiar with SQL and string matching within SQL, it makes use of the LIKE = '[string]' clause. Where [string] is the string to be searched for. Optionally, you could also use wildcards like _ (underscore), % (percent), [charlist] (character list) or [^charlist] (negated character list) or a combination of those to match string fragments. Unfortunately this has also been one of the major points of entry for hacking/hijacking a database-driven website, via SQL injection. Due to this vulnerability, I’ve either been using heavy data validation, or store encoded data, or at times, avoiding this altogether… but I digress.

Using regex in my SQL queries is a godsend. It helps reduce my data processing and validation overhead.

MySQL supports the use of almost all POSIX regex  metacharacters via the REGEXP or RLIKE clause (or a negation using NOT REGEXP or NOT RLIKE).

The following list describes some characteristics of extended regular expressions:

  • “.” matches any single character.
  • A character class “[...]” matches any character within the brackets. For example, “[abc]” matches “a”, “b”, or “c”. To name a range of characters, use a dash. “[a-z]” matches any letter, whereas “[0-9]” matches any digit.
  • “*” matches zero or more instances of the thing preceding it. For example, “x*” matches any number of “x” characters, “[0-9]*” matches any number of digits, and “.*” matches any number of anything.
  • A REGEXP pattern match succeeds if the pattern matches anywhere in the value being tested. (This differs from a LIKE pattern match, which succeeds only if the pattern matches the entire value.)
  • To anchor a pattern so that it must match the beginning or end of the value being tested, use “^” at the beginning or “$” at the end of the pattern.

Let’s start with the traditional SELECT ...  LIKE query. Here’s a query that selects all animals whose name begins with “ant”

SELECT name

FROM animals

WHERE name LIKE 'ant%'

The results could be something like this:

name

---------

ant

anteater

antelope

...

A regex version of that would be:

SELECT name

FROM animals

WHERE name REGEXP '^ant'

So far they look similar… but, say, what if you wanted to use a complex rule like: “select all animals whose names starts with either ‘a’ or ‘c’ and ends with either ‘t’ or ‘p’.  It would look messy if you do it this way:

SELECT name

FROM animals

WHERE name LIKE 'a%t'

OR name LIKE 'a%p'

OR name LIKE 'c%t'

OR name LIKE 'c%p'

However with regex, it is as simple as this:

SELECT name

FROM animals

WHERE name REGEXP '^[ac].*[tp]$'

Imagine if your filtering conditions were much more complex. It’s not hard to see how regexp can help with that!

Regex saves the day, again

August 26th, 2009 Chuck No comments

I was supposed to do a “search” feature and a “related items” feature for the module I was assigned to. This was were my regex skills came in handy yet again.

I made a very barebone prototype of it, which takes some input text for the search query against some stored content, in this case, a full-text article (although the final version should be against a database).

The prototype I made is still very rough around the edges. What it does is that it extracts all keywords from the search query using this expression:

$regex = '/\b([\p{L}|\p{Ll}|\p{Lu}|_]+?)\b/i';

That rule can be translated as follows:

take any combination of unicode characters and underscores that are enclosed in word boundaries

Having done that, I place all matched data into an array and call that my “keywords” array, to be used later.

The next thing I did was to chop the article down into smaller pieces, currently by sentence (I only used periods as a delimiter, I should probably include other punctuation marks I guess). Then I ran preg_match() through each piece to quickly check for matches for any of the keywords. These results are then compiled.

In each compiled piece, I assign a corresponding weight. This weight is my arbitrary way of picking the best match. My currently implementation sets weight to be equal to the number of characters in the piece that are matched with keywords. I still have to refine these rules later on.

When all pieces have their weight assigned, I sort them according to weight in descending order (highest to lowest weight). Then I display the results and highlight the matched characters.

Here’s the prototype I made: search tool.

I’m still thinking of the refinements I could make so I’ll just post them here as I go along.

Also, it’s good to know that MySQL supports regex in your SQL queries, this should save me a lot of time!