PHP Serialize vs Database normalization

I’ve recently started developing plugins for WordPress in PHP. Being an old school Perl programmer PHP comes very easy and MySQL is still same old MySQL. PHP don’t have many advantages over Perl in general except one very good one: simplicity. I have always tried to write simple code, not simple in the sense that it doesn’t accomplish complex tasks rather in the sense that while being a huge and complex system it is still built with easy to understand blocks of code. With that being said, there are a few shortcuts I rather not take.

The reason I write this is that in all the PHP applications and PHP documentation I’ve come across regarding serialize() nobody ever mentions database normalization.

PHP Serialize

I found the serialize() function in PHP quite useful, it takes a datastructure and creates a string representation of that structure. This string can later be use with unserialize() to return it to the old structure. An example:

$fruits = array (
"fruits"  => array("a" => "orange", "b" => "banana"),
"numbers" => array(1, 2, 3),
);

echo print_r($fruits);

The above code creates an array and prints the result. The output of the above will be:

Array
(
    [fruits] => Array
        (
            [a] => orange
            [b] => banana
        )

    [numbers] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )
)

Now if you use serialize on this object the following would happen:

$fruits = serialize($fruits);

echo print_r($fruits);

Output:

1a:2:{s:6:"fruits";a:2:{s:1:"a";s:6:"orange";s:1:"b";s:6:"banana";}s:7:"numbers";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}1Array

A long line of strange numbers, just what the programmer wanted! This data is perfect for transfering or saving the state of a data structure for later use. Calling unserialize() on the above string would return it to the same array that we first had.

Database Design

Most applications use a relational database for storing information. A relational database stores all data in tables of rows and columns (or relations of tuples and attributes if you use the original non-SQL names). To make a database work efficiently the design of those tables, rows and columns are pivotal. Any student of database design have probably been forced to read all of the different levels of database normalization. The normalization process, invented by Edgar F. Codd, involves searching for inefficient database design and correcting it.

The very first rule of database normalization called the first normal form (1NF) stipulates that “the table is a faithful representation of a relation and that it is free of repeating groups.” [wikipedia]. This means that there should be no duplicate rows and no column should contain multiple values.

Serialization meets 1NF

What happens if you insert the above serialized data into a column of a row in a database? Well put shortly you get a stored datastructure that can be easily accessed by your application by calling it with the keys for that particular row. The table would probably look something like this:

ArrayTable
key value
1 1a:2:{s:6:”fruits”;a:2:{s:1:”a”;s:6:”orange”;s:1:”b”;s:6:”banana”;}s:7:”numbers”;a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}1Array
2 1a:2:{s:6:”fruits”;a:2:{s:1:”a”;s:6:”apples”;s:1:”b”;s:6:”banana”;}s:7:”numbers”;a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}1Array

As long as you will never ever search for anything inside the value field this is all good and well (but still goes against my better teachings of database normalization). Take for example the problem of locating all stuctures containing apples or even worse something as simple as ordering the rows by fruit! The structure makes such “simple” tasks very hard.

The use of serialization to encode values into the database might be very tempting. It makes saving complex structures easy without having to worry about database design. Saving data in serialized form is however very inefficient from a database design standpoint, the data should have been stored in separate tables reflecting their internal structure.

As I said in the beginning simplicity is the highest virtue of programming for me, serialize is a simple neat solution for a small problem. What should be remembered though is that serialize is not a swizz army knife that should be used for all the database storage. If you ever think that you will need to search through or handle the information stored, do youself a favour and make it a proper table from the start. In the long run making those tables will be easier than later having to convert all those structures and complex code handling them.

Having fun with the WordPress Database

Today I’ve improved the starting page of this domain, www.fireflake.com. It looked very booring and I thought I’d have to do something with it but at the same time I knew I was to lazy to acctually maintain some interesting subjects there as well. The solution was simple! Use my blogs to feed the information to the starting page! Here are some of my php and sql code used, in case you want to try something similar yourself.

First of all, if you’ve never done any PHP before, do not fear, it’s super easy! Without explaining it, here are my open and close database files (I keep them seperate since they will most likely be included in many pages):

open_db.php:

<?php
$dbhost = ‘localhost’;
$dbuser = ‘username’;
$dbpass = ‘password’;

$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die                      (‘Error connecting to mysql’);

$dbname = ‘dbname’;
mysql_select_db($dbname);
?>

close_db.php:

<?php
mysql_close($conn);
?>

Just modify these files with the values for your database and you are ready to do some mysql-powered-php-scripting! Next simply include these files in the file where you want to use the database, in my example index.php:

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01 Strict//EN” “http://www.w3.org/TR/html4/strict.dtd”>
<?php include ‘open_db.php’; ?>
<html>

</html>
<?php include ‘close_db.php’; ?>

Now lets make the MySQL database print something to the index-page! Here is an example:

<?php
$query  = “select now() as mysqltime”;
$result = mysql_query($query);
while($row = mysql_fetch_array($result, MYSQL_ASSOC))
{
echo $row[‘mysqltime’];
}
?>

This will print a timestring from your MySQL. I used this as an example just because it should work everywhere. Now simply substitute the query with something you want from your database and the printout inside the loop with what (and how) you want it printed!

To take a WordPress example, I want to use the tags/categories as keywords for the META-tag of my start page. I use a cleaverly written SQL query that will give me all the unique keywords for all my three blogs sorted in order of usage:

<?php
$query  = “SELECT name,`count` FROM tech_terms t join tech_term_taxonomy tt on (t.term_id = tt.term_id)
union distinct
SELECT name,`count` FROM game_terms t join game_term_taxonomy tt on (t.term_id = tt.term_id)
union distinct
SELECT name,`count` FROM 3wp_terms t join 3wp_term_taxonomy tt on (t.term_id = tt.term_id)
order by `count` desc limit 0,50;”;
$result = mysql_query($query);
$keywords = array();
while($row = mysql_fetch_array($result, MYSQL_ASSOC)){
array_push($keywords,$row[‘name’]);
}
$keywords = array_unique($keywords);
foreach ($keywords as $key){
$meta_key_string .= $key.”,”;
}
?>
<meta name=”keywords” content=”<?php echo $meta_key_string ?>fireflake”>

This will make my home meta tag always up to date with what I write about! The keywords will also be listed in order of relevance since they are ordered by “count” from the database.

Another great little script takes out the latest posts in a blog and prints them (and links it to the main article):

<?php
$query  = “select * from tech_posts where post_status = ‘publish’ order by post_date desc limit 0,2”;
$result = mysql_query($query);
while($row = mysql_fetch_array($result, MYSQL_ASSOC))
{
echo “<h3><a href=”” . $row[‘guid’] . “”>” . $row[‘post_title’] .”</a></h3>”;
echo “<p>” . str_replace(“n”,”<br />n”,$row[‘post_content’]) .”</p><br clear=”all”>”;
}
?>

There are probably lots of fun you can have with the WordPress database, it’s very simple and easy to learn so it’s very easy to start writing code like this!

Hope you found something useful here!