PHP Serialize vs Database normalization

I’ve recently started developing plugins for WordPress in PHP. Being an old school Perl programmer PHP comes very easy and MySQL is still same old MySQL. PHP don’t have many advantages over Perl in general except one very good one: simplicity. I have always tried to write simple code, not simple in the sense that it doesn’t accomplish complex tasks rather in the sense that while being a huge and complex system it is still built with easy to understand blocks of code. With that being said, there are a few shortcuts I rather not take.

The reason I write this is that in all the PHP applications and PHP documentation I’ve come across regarding serialize() nobody ever mentions database normalization.

PHP Serialize

I found the serialize() function in PHP quite useful, it takes a datastructure and creates a string representation of that structure. This string can later be use with unserialize() to return it to the old structure. An example:

$fruits = array (
"fruits"  => array("a" => "orange", "b" => "banana"),
"numbers" => array(1, 2, 3),
);

echo print_r($fruits);

The above code creates an array and prints the result. The output of the above will be:

Array
(
    [fruits] => Array
        (
            [a] => orange
            [b] => banana
        )

    [numbers] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )
)

Now if you use serialize on this object the following would happen:

$fruits = serialize($fruits);

echo print_r($fruits);

Output:

1a:2:{s:6:"fruits";a:2:{s:1:"a";s:6:"orange";s:1:"b";s:6:"banana";}s:7:"numbers";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}1Array

A long line of strange numbers, just what the programmer wanted! This data is perfect for transfering or saving the state of a data structure for later use. Calling unserialize() on the above string would return it to the same array that we first had.

Database Design

Most applications use a relational database for storing information. A relational database stores all data in tables of rows and columns (or relations of tuples and attributes if you use the original non-SQL names). To make a database work efficiently the design of those tables, rows and columns are pivotal. Any student of database design have probably been forced to read all of the different levels of database normalization. The normalization process, invented by Edgar F. Codd, involves searching for inefficient database design and correcting it.

The very first rule of database normalization called the first normal form (1NF) stipulates that “the table is a faithful representation of a relation and that it is free of repeating groups.” [wikipedia]. This means that there should be no duplicate rows and no column should contain multiple values.

Serialization meets 1NF

What happens if you insert the above serialized data into a column of a row in a database? Well put shortly you get a stored datastructure that can be easily accessed by your application by calling it with the keys for that particular row. The table would probably look something like this:

ArrayTable
key value
1 1a:2:{s:6:”fruits”;a:2:{s:1:”a”;s:6:”orange”;s:1:”b”;s:6:”banana”;}s:7:”numbers”;a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}1Array
2 1a:2:{s:6:”fruits”;a:2:{s:1:”a”;s:6:”apples”;s:1:”b”;s:6:”banana”;}s:7:”numbers”;a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}1Array

As long as you will never ever search for anything inside the value field this is all good and well (but still goes against my better teachings of database normalization). Take for example the problem of locating all stuctures containing apples or even worse something as simple as ordering the rows by fruit! The structure makes such “simple” tasks very hard.

The use of serialization to encode values into the database might be very tempting. It makes saving complex structures easy without having to worry about database design. Saving data in serialized form is however very inefficient from a database design standpoint, the data should have been stored in separate tables reflecting their internal structure.

As I said in the beginning simplicity is the highest virtue of programming for me, serialize is a simple neat solution for a small problem. What should be remembered though is that serialize is not a swizz army knife that should be used for all the database storage. If you ever think that you will need to search through or handle the information stored, do youself a favour and make it a proper table from the start. In the long run making those tables will be easier than later having to convert all those structures and complex code handling them.

WordPress as a simple CRM

I recently started a new business where I really want to focus on taking care of the customer needs, being proactive rather than reactive to them. As such I need a simple Customer Relations Management (CRM) tool to keep track of my promises and contacts. There are probably many simple CRM tools available but I decided to try out WordPress as a CRM tool.

First I installed WordPress on an internal server with no external connection. I set the firewall to block that server from traffic with the outside network and then I started to do my internal “company blog”. To structure things I decided to follow some simple rules:

  • I make one post for every type of contact (e-mail / phone / order etc) I do every day, if several contacts to the same company / person occurs the same day they still only get one.
  • Several types of contact to the same company / person will get multiple posts the same day
  • Categories are other authorities, companies and/or persons
  • Tags are techniques, events, frameworks etc.

On average I get three to four posts every day, usually covering a broad area. Some days there are big events which often are reflected in the blog/CRM by only having one post for that day. Amounts of posts per day is therefore irrelevant. I keep the posts very short, they are mainly thought of as references to other information like an e-mail or something else. If it was a phone conversation I usually take down a few simple sentances of what the discussion was about.

Three months later I now use this internal blog alot! It helps me keep track of events that I might have forgotten about. When I had a tax issue recently I could quickly click the “tax authority” category and see which days I had communicated with them and leave as a reference in my future communication.

One thing that also helps me is the simplicity of clicking a category to bring up all the communication with that customer. When someone calls I quickly click their category and all my previous conversations with them are recorded. It helps me quickly remember what we where talking about, just like a CRM should.

There are of course limitations, WordPress was never intended to be used this way. There are no way to search for inactive customers for example, should the need for this arise a plugin for WordPress could most certainly easily be developed. Furthermore you need to be very careful about where you install the software so you do not publish all your information on the Internet. I run my business alone but this setup would work very nice also with a few employees I would imagine. Everyone could be an author in the same blog and you can access what the other persons are working on should a customer call when they are out.

The simplicity of WordPress makes this a great choice for me!

VISA PIN numbers stolen

In one of the worst hacker attacks against on-line resources a hacker managed to get a full listing of all the PIN-numbers associated with VISA credit cards. Apparently also MasterCard and American Express have been compromised. This undermines the whole system of PIN code and credit card to ensure the safety of you money. Since the list allready have been published on the Internet I link it here as a confirmation that the full list is out!

VISA PIN codes

Follow the link and use CTRL + F to search the file for your own PIN number, it’s there!

A new system for handling transaction verification will be needed, meanwhile we need to keep our eyes open watch out bank accounts!

EDIT: April Fools! This was of course an April Fools joke. The file contains all numbers from 0000 to 9999 so obviously all PIN numbers are included.