How to avoid a page being cached

All web programmers have probably had trouble with browsers caching pages it ought not to. So what can we do about it? Well in good old HTTP 1.0 we had a nice header that simply said:

Pragma: no-cache

Easy huh? Yes. Probably to easy. If not browsers then sure some proxy server will dissobey that simple command and require that we explain it to them more thoroughly. This brings on the next HTTP-command:

Expires: -1

Acctually any invalid date format will do, the meaning should be interpreted as “this page have ceased to be” [mental image of John Cleese banging a parrot on the desk]. Only problem is still some missbehaving browsers and proxys interpret this as “well you might have written an erranous date, so we play nice and cache the page for you still”. Cue HTTP 1.1 and we have another header:

Cache-control: no-cache

Oh, remember this directive? Easy huh? Heard it before. Yes, it’s to easy to be true as well. The problem with this one is that some missbehaving reverse-proxys apparently fails to deliver these pages through the proxy in what seems to be their inability to forward it since they are not allowed to save it. At least in my case it was a reverse proxy that seemed to think very little of pages it wasn’t allowed to keep. We had to give it “Cache-control: private” in order for it to acctually pass the page on. The obvious problem with this is that it no longer prehibits the end user agent (as opposed to a in the middle proxy) to cache the page.

Now all available headers have failed in some way, add to this that someone using HTTP 1.0 might try and send a cache-control which will fail due to it not being part of 1.0 or in reverse someone using 1.1 sending Pragma header which might be ignored due to being replaced by cache-control in 1.1.

What is a programmer to do? Well, since proxys have made me not rely on normal HTTP headers the next step is into HTML and the http-equiv META tags. Let’s blast the browser with everything we have:

<meta http-equiv=”Expires” content=”-1″>
<meta http-equiv=”Pragma” content=”no-cache”>
<meta http-equiv=”Cache-Control” content=”no-cache”>

Now no proxy should ever interfere with our headers. The problem with cache-control and pragma remains so if you use HTTP 1.0 the former is ignored and in 1.1 the latter. If we include both we are safe, at least until they decide to probably change the whole thing in a future 1.2 version. We also send the expires tag which should make its way all the way to the browser without being cached. Hopefully at least one of these will be treated with respect by the browser, this is even partly recommended in an old KB-article from Microsoft. Still http-equiv is not as safe as real HTTP headers, it requires the browsers to support them. Some support them better than others (the article is old but still sends my head spinning in dissbelief).

Being dissillusioned by the current state of cache control (not the header, the subject) I ended up doing what probably most people are doing allready. Appending a random 10 character string to every call I ever make effectivly fooling the browser that this information might be improtant and making it update the page properly. Just append it to the back of every GET and include a random field in every POST.

http://www.fireflake.com/?cache=ds4R3HYh4

http://www.fireflake.com/?cache=BawqEw42cf

Not the same page. Obviously. Please don’t tell any browser developer this or they might include a “random cache of everything in the known universe”-feature in their next build.

PHP Serialize vs Database normalization

I’ve recently started developing plugins for WordPress in PHP. Being an old school Perl programmer PHP comes very easy and MySQL is still same old MySQL. PHP don’t have many advantages over Perl in general except one very good one: simplicity. I have always tried to write simple code, not simple in the sense that it doesn’t accomplish complex tasks rather in the sense that while being a huge and complex system it is still built with easy to understand blocks of code. With that being said, there are a few shortcuts I rather not take.

The reason I write this is that in all the PHP applications and PHP documentation I’ve come across regarding serialize() nobody ever mentions database normalization.

PHP Serialize

I found the serialize() function in PHP quite useful, it takes a datastructure and creates a string representation of that structure. This string can later be use with unserialize() to return it to the old structure. An example:

$fruits = array (
"fruits"  => array("a" => "orange", "b" => "banana"),
"numbers" => array(1, 2, 3),
);

echo print_r($fruits);

The above code creates an array and prints the result. The output of the above will be:

Array
(
    [fruits] => Array
        (
            [a] => orange
            [b] => banana
        )

    [numbers] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )
)

Now if you use serialize on this object the following would happen:

$fruits = serialize($fruits);

echo print_r($fruits);

Output:

1a:2:{s:6:"fruits";a:2:{s:1:"a";s:6:"orange";s:1:"b";s:6:"banana";}s:7:"numbers";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}1Array

A long line of strange numbers, just what the programmer wanted! This data is perfect for transfering or saving the state of a data structure for later use. Calling unserialize() on the above string would return it to the same array that we first had.

Database Design

Most applications use a relational database for storing information. A relational database stores all data in tables of rows and columns (or relations of tuples and attributes if you use the original non-SQL names). To make a database work efficiently the design of those tables, rows and columns are pivotal. Any student of database design have probably been forced to read all of the different levels of database normalization. The normalization process, invented by Edgar F. Codd, involves searching for inefficient database design and correcting it.

The very first rule of database normalization called the first normal form (1NF) stipulates that “the table is a faithful representation of a relation and that it is free of repeating groups.” [wikipedia]. This means that there should be no duplicate rows and no column should contain multiple values.

Serialization meets 1NF

What happens if you insert the above serialized data into a column of a row in a database? Well put shortly you get a stored datastructure that can be easily accessed by your application by calling it with the keys for that particular row. The table would probably look something like this:

ArrayTable
key value
1 1a:2:{s:6:”fruits”;a:2:{s:1:”a”;s:6:”orange”;s:1:”b”;s:6:”banana”;}s:7:”numbers”;a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}1Array
2 1a:2:{s:6:”fruits”;a:2:{s:1:”a”;s:6:”apples”;s:1:”b”;s:6:”banana”;}s:7:”numbers”;a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}}1Array

As long as you will never ever search for anything inside the value field this is all good and well (but still goes against my better teachings of database normalization). Take for example the problem of locating all stuctures containing apples or even worse something as simple as ordering the rows by fruit! The structure makes such “simple” tasks very hard.

The use of serialization to encode values into the database might be very tempting. It makes saving complex structures easy without having to worry about database design. Saving data in serialized form is however very inefficient from a database design standpoint, the data should have been stored in separate tables reflecting their internal structure.

As I said in the beginning simplicity is the highest virtue of programming for me, serialize is a simple neat solution for a small problem. What should be remembered though is that serialize is not a swizz army knife that should be used for all the database storage. If you ever think that you will need to search through or handle the information stored, do youself a favour and make it a proper table from the start. In the long run making those tables will be easier than later having to convert all those structures and complex code handling them.

2009 – the year of the browsers

In 1989 we had zero web browsers as we know them today, allthough just about to be invented around the corner. In 1999 we had two web browsers fighting a death match, Internet Explorer and Netscape Navigator – a fight with Netscape cleverly lost by dying and coming back several open source reincarnations of which Firefox of course is the most well known today. 2009 is turning out to be yet another battle year for browsers, this time many more of them! We have (in no special order) the newcommer Google Chrome fighting Firefox and Internet Explorer (mainly the PC-side). We have Opera who has cut out a piece of the action on several systems but shine mostly in portable devices. Safari is ruling the Macintosh but is starting to get some interference from Firefox.

Well that is now, what is next? I ready a post about current state of browser development, and many of the major browsers have a beta our that will maybe go live sometime during the next year. While this might be very good news for home users I am sure it will mean alot of work for someone like myself who create on-line applications. There used to be a lot of tuning to make web pages and applications look and work the same on the old “two major browsers”, now we have at least 5! Unless the browser developers makes a great effort to follow the rules of the standards each web page have to compensate for how a particular browser parses the data.

In the past Internet Explorer have seemingly intentionally ignored several standards in favour of making programmers like myself forced to make pages look good on their browser. Internet Explorer is afterall the dominating browser and it have to work. The question is if this strategy is allowed to continue. I really hope for the sake of us programmers that while there are five new browser versions about to be released that several of them will render the basic pages using the same rulset.

Having fun with the WordPress Database

Today I’ve improved the starting page of this domain, www.fireflake.com. It looked very booring and I thought I’d have to do something with it but at the same time I knew I was to lazy to acctually maintain some interesting subjects there as well. The solution was simple! Use my blogs to feed the information to the starting page! Here are some of my php and sql code used, in case you want to try something similar yourself.

First of all, if you’ve never done any PHP before, do not fear, it’s super easy! Without explaining it, here are my open and close database files (I keep them seperate since they will most likely be included in many pages):

open_db.php:

<?php
$dbhost = ‘localhost’;
$dbuser = ‘username’;
$dbpass = ‘password’;

$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die                      (‘Error connecting to mysql’);

$dbname = ‘dbname’;
mysql_select_db($dbname);
?>

close_db.php:

<?php
mysql_close($conn);
?>

Just modify these files with the values for your database and you are ready to do some mysql-powered-php-scripting! Next simply include these files in the file where you want to use the database, in my example index.php:

<!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01 Strict//EN” “http://www.w3.org/TR/html4/strict.dtd”>
<?php include ‘open_db.php’; ?>
<html>

</html>
<?php include ‘close_db.php’; ?>

Now lets make the MySQL database print something to the index-page! Here is an example:

<?php
$query  = “select now() as mysqltime”;
$result = mysql_query($query);
while($row = mysql_fetch_array($result, MYSQL_ASSOC))
{
echo $row[‘mysqltime’];
}
?>

This will print a timestring from your MySQL. I used this as an example just because it should work everywhere. Now simply substitute the query with something you want from your database and the printout inside the loop with what (and how) you want it printed!

To take a WordPress example, I want to use the tags/categories as keywords for the META-tag of my start page. I use a cleaverly written SQL query that will give me all the unique keywords for all my three blogs sorted in order of usage:

<?php
$query  = “SELECT name,`count` FROM tech_terms t join tech_term_taxonomy tt on (t.term_id = tt.term_id)
union distinct
SELECT name,`count` FROM game_terms t join game_term_taxonomy tt on (t.term_id = tt.term_id)
union distinct
SELECT name,`count` FROM 3wp_terms t join 3wp_term_taxonomy tt on (t.term_id = tt.term_id)
order by `count` desc limit 0,50;”;
$result = mysql_query($query);
$keywords = array();
while($row = mysql_fetch_array($result, MYSQL_ASSOC)){
array_push($keywords,$row[‘name’]);
}
$keywords = array_unique($keywords);
foreach ($keywords as $key){
$meta_key_string .= $key.”,”;
}
?>
<meta name=”keywords” content=”<?php echo $meta_key_string ?>fireflake”>

This will make my home meta tag always up to date with what I write about! The keywords will also be listed in order of relevance since they are ordered by “count” from the database.

Another great little script takes out the latest posts in a blog and prints them (and links it to the main article):

<?php
$query  = “select * from tech_posts where post_status = ‘publish’ order by post_date desc limit 0,2”;
$result = mysql_query($query);
while($row = mysql_fetch_array($result, MYSQL_ASSOC))
{
echo “<h3><a href=”” . $row[‘guid’] . “”>” . $row[‘post_title’] .”</a></h3>”;
echo “<p>” . str_replace(“n”,”<br />n”,$row[‘post_content’]) .”</p><br clear=”all”>”;
}
?>

There are probably lots of fun you can have with the WordPress database, it’s very simple and easy to learn so it’s very easy to start writing code like this!

Hope you found something useful here!

Using PHP to check response time of HTTP-server

I must start off with admiting that my PHP skills are very limited, however being a very experienced Perl hacker this is very similar to me. Edit 2013-01-30: This is no longer true 🙂

I needed a script that checked for a normal HTTP response from another server, a status script to see if the other server(s) where behaving as they should.

The resources on-line for PHP are great and I quickly found the code needed to retrieve a remote page and read the contents. I also found some tutorials describing how to use this code. My version ended up looking like this (thanks phptoys.com for the tutorial!):

<?php
// check responsetime for a webbserver
function pingDomain($domain){
    $starttime = microtime(true);
    // supress error messages with @
    $file      = @fsockopen($domain, 80, $errno, $errstr, 10);
    $stoptime  = microtime(true);
    $status    = 0;

    if (!$file){
        $status = -1;  // Site is down
    }
    else{
        fclose($file);
        $status = ($stoptime - $starttime) * 1000;
        $status = floor($status);
    }
    return $status;
}
?>

What this code does is to measure, using the microtime function, the time difference between initiating a connection using fsockopen and when that functions has completed executing. If a connection was established the time difference is returned. If fsockopen failed to open a connection -1 is returned.

The time difference is multiplied by 1000 to get the number of milliseconds it took, floor() is then used to round down to the nearest integer value.

To call this function simply add the domain or IP you want to check the response time of:

Fireflake: <?php echo pingDomain('tech.fireflake.com'); ?> ms<br>
Example: <?php echo pingDomain('www.example.com'); ?> ms<br>
Internal IP: <?php echo pingDomain('192.168.0.2'); ?> ms<br>
Fail: <?php echo pingDomain('fail.fireflake.com'); ?> ms<br>

Sample output from the above statements are:

Fireflake: 111 ms
Example: 139 ms
Internal IP: 0 ms
Fail: -1 ms

Also, sometimes DNS servers return a “search engine” response if the domain is unknown or unreachable. To be sure you reach the server you want try calling it by IP-number instead to make sure your DNS isn’t fooling you.

EDIT:  Thx for adding the tip to use @ to supress error messages. Just use @fsockopen to supress the inevitable error message.

EDIT 2013-01-30: Fixed the old code and added some more examples.

EDIT 2013-03-07: Added clarification about unit used in $status.

Using AJAX to asynchronously load slow XML files

More and more I’ve come across situations where I want to use AJAX to download a XML file to use in the interface but know beforhand that the file will take a long time to load. With asyncroneous download of XML files by JavaScript, which is kind of what the buzz word AJAX is all about, you must be carefull not to leave the client in limbo between a useable interface and a locked up screen.

Unfortunately this script only works in Internet Explorer, useful tips of how to port it properly (with the asynchronous property intact) would be highly appreciated.

Here is a simple description of the basic functions needed to perfom a asynchronous download where the user will have the option to abort.

First we need a simple function that download the XML, this is pretty standard and the code is lovingly ripped off from w3school.com.

function loadXml(sUrl){
	try{
		//Internet Explorer
		xmlDoc=new ActiveXObject("Microsoft.XMLDOM");
	}
	catch(e){
		try{
			//Firefox, Mozilla, Opera, etc.
			xmlDoc = document.implementation.createDocument("","",null);
		}
		catch(e) {
			alert(e.message)
		}
	}
	try{
		xmlDoc.async = 'true';
		xmlDoc.load(sUrl);
	}
	catch(e) {
		alert(e.message)
	}
}

This code is pretty straight forward and I assume you allready know of it, if not read the guide over at W3Schools. The only difference in the above code compared to that from the tutorial over at W3Schools is the flag “xmlDoc.async = ‘true'”. This means that the code will continue executing after the load is called without waiting for the load to finish. This will place the status of the xmlDoc variable in a limbo which can be checked with the “readyState” flag.

To check if our file is ready to use we have a test-loop that will check if readyState changes:

function testReadyLoop(){
	i++;
	if (xmlDoc.readyState == 4){
		// the file has completed the download
		alert('xmlDoc ready to use! Contents:n' + xmlDoc.xml);
		// TODO: add code here of what to do with the file
	}
	else{
		if (!abortXmlLoad){
			// try again in 1 second
			setTimeout("testReadyLoop();",1000);
		}
		else{
			// stop loading the xml file
			xmlDoc.abort();
			alert('Loading of the XML file aborted!');
		}
	}
}

The incrementation of the variable “i” is just a counter that will be used later and the “abortXmlLoad” is a boolean if the loop should continue or not, these will be explained later. What happens in this function is that it first tests if readyState is 4 which indicates that the file is ready to be used, if this is the case we simple show an alert with the contents of the file, here more intelligent code would be placed. If it’s not ready it checks if it should continue waiting for the file or not, if it should it calls itself in 1 second (1000 ms) otherwise it aborts the loading and simply stops.

To abort a download we need to set the “abortXmlLoad” flag to true, a short function is needed for this:

function abortAsyncXML(){
	// set the abort flag to true
	abortXmlLoad = true;
}

Now we have all the functions needed for the asynchronous download, a last function is added to tie them all togheter:

function loadAsyncXML(sUrl){
	// set abort to false and start download
	abortXmlLoad = false;
	i = 0;
	loadXml(sUrl);
	// start loop to check when ready
	testReadyLoop();
}

This function first resets the values of “i” and “abortXmlLoad” and then it calls the download and after that starts the loop to test if the download is ready. The file will now download silently in the background and pop an alert when ready unless someone calls “abortAsyncXml” before that happens.

As you may have noticed there are a few global variables I use across the functions that also need to be added to the top of the script:

var xmlDoc;
var abortXmlLoad;
var i;

To use this script a small form need to be added to the page:

<form>
<input type="button" value="load" onclick="loadAsyncXML('sample.xml');">
<input type="button" value="abort" onclick="abortAsyncXML();">
</form>

This will load the file “sample.xml” and abort if the abort button is pushed. In order to test that the abort button is working you would have to build a slow loading page that simulates long loading time.

I will post a link to the full code and sample later. Hope you found this helpful.

CSS layout made easy

While it hurts my nitpicking handcodeing image to admit it, some frameworks are to good to ignore. One of them I found recently in Google BlueprintCSS. It’s a very flexible framework and with a license “you cannot refuse”.

I”ve used to code all CSS and HTML by hand (still!) and it is getting pretty tiresome to stumble over the same defects in every design I make. To pressed for time I’ve never developed a framework of my own, rather just copy and pasted bits and pieces from my old code that I knew was working.

There are many other CSS frameworks out there but somehow I fell for Google Blueprint though I have far from done an extensive testing on them. If anyone have found any other framework very good and flexible please post a comment, I’d love to hear it.

Small update on Google Analytics

I’ve used the code from my previous post on almost all my sites now a couple of days and all the statistics are still working and I no longer experience any slow loading times using Firefox with NoScript. Until I see either a change in Googles code (that they use a single domain for javascripts) or a new version of NoScript (that makes an exception for Google-related domains if you allow google-analytics.com) I will keep this code as it greatly improves the performance of my website.

Slow loading with Google Analytics

I experienced my pages slowing down when using Google Analytics togheter with my Firefox AddOn NoScript. Since I’m far from the only one using NoScript I found this not acceptable and worked out a possible work around.

The reason for the slow down is likely the timeout of the connection between my domain (where I’ve allowed scripts to run) and Googles domains (where I might or might not have allowed scripts).

Googles code look like this (where “UA-1111111-1” is your tracking ID):

<script type=”text/javascript”>
var gaJsHost = ((“https:” == document.location.protocol) ? “https://ssl.” : “http://www.”);
document.write(unescape(“%3Cscript src='” + gaJsHost + “google-analytics.com/ga.js’ type=’text/javascript’%3E%3C/script%3E”));
</script>
<script type=”text/javascript”>
var pageTracker = _gat._getTracker(“UA-1111111-1”);
pageTracker._initData();
pageTracker._trackPageview();
</script>

The first part of the code creates an obfuscated loading of the script located at google-analytics.com/ga.js. It picks a prefix of www if it’s a standard connection and ssl if it’s  an encrypted connection. The problem is that NoScript does not recognize this code and ends up in a deadlock of wether or not to allow the script to be run. My guess is that there is another script loaded from another domain called googlesyndication.com which fails to enter the NoScript-test and locks the loading of the page.

A possible fix that I’m still evaluating but which should do the trick is the following (code in bold added):

<script src=”http://www.google-analytics.com/ga.js” type=”text/javascript”></script>
<script type=”text/javascript”>
var pageTracker = _gat._getTracker(“UA-1111111-1”);
pageTracker._initData();
pageTracker._trackPageview();
</script>

As you can see I’ve made the obfoscated code clear text code and chosen the http://www-prefix (since I’m not using an encrypted connection for my server). Should you use encryption on your site simply switch http://www to https://ssl instead (this is what the javascript used to do). If you have a page which might be loaded encrypted or normally then you would have to include this choice earlier in a server side script for example.

After this fix Google Analytics works like a charm togheter with NoScript on any script-level setting for me.

Try this at your own risc, this is still experimental to me as well!