Rating: 1 Star2 Stars3 Stars4 Stars5 Stars
Loading...

Friendly URLs in PHP: why do you care?

Nice URLs, readable URLs, search-engine-friendly URLs. Different names same deal.

It can’t really be disagreed on that something like example.com/index.php?page=article&id=409 is not nearly as nice as example.com/article/409.

Turns out this isn’t all that hard with PHP – infact it can turn into something that’s very useful from more than just a readability viewpoint.

The Plan

What we’re going to do is capture everything past a particular point in our URL and pipe it to PHP, we’re going to let PHP do most of the the legwork.

* Using mod_rewrite we get everything past a particular point.
* We will then pass that into a PHP get variable. We’ll use “p”, you can use anything.
* In our PHP we will define “rules” for our URLs with very simple regular expressions.
* When something matches, we’ll get the appropriate page – otherwise we’ll give back a 404.

mod_rewrite

Yep, this article is purely from an Apache perspective, so I assume that you not only have Apache on your webserver, that you also have enabled mod_rewrite.

If you haven’t enabled mod_rewirite, it’s usually something like this in your http.conf:

LoadModule rewrite_module modules/mod_rewrite.so

Remember to restart Apache afterwards.

.htaccess

Let’s get .htaccess out of the way now. These are special files that you may, or may not have come accross. Basically, Apache will read these set of definable rules and determines what to serve when you visit a page.

In this case we’re going to use mod_rewrite rules. So, first create a file called .htaccess and open it up. (To Windows folk – It really has no “filename”, this might confuse some of the poorer text editors.)

To start with, we’re going to check if mod_rewrite is enabled – if it isn’t then Apache will return an internal server error (500).

All our rules will go inside this block.

First we need to enable the rewrite engine and set our base.

The RewriteBase will tell Apache exactly where our capturing needs to come from, what the relative path is.

In our case we’re assuming we’re rewriting the root of a URL. If your script resided in a sub-directory like http://example.com/test, RewriteBase would have to be written like so:

Our first rule will tell Apache where to go when we haven’t specified anything, in this case we simply want to go to index.php:

RewriteRule is how we define our actual rules.

Rewrite rules use regular expressions (regex for short), a kind of special text processing language, to define the rules. They can become very useful if you learn them properly, and we shall be using them later when we write the PHP portion of this article. PHP comes with reasonably good regex support built in.

In this case our regular expression is ^$. The caret symbol, ^, means “At the start”. The dollar sign, $, means “At the end.” So this pattern effectively matches nothing, there is nothing between the two symbols.

Don’t worry if regular expressions are confusing at first, there are many tutorials on it around.

The second part of RewriteRule defines where our rule goes to, in this case we always want an empty path to go to index.php.

The last part in square brackets defines a special flag, here we use the L flag. L defines this rule as the “last one” – if this rule matches then Apache will disregard any further ones.

Our next rule is the one that we came here for:

In regex a fullstop means “any single character” and asterisk means “one or more of the previous character”. So our pattern is effectively – any amount of any character or simply, everything.

The parentheses around the rule indicate we want to group that match, and use it later. We do use in defining where our pattern goes to. You refer to groups with the dollar sign and the number of group. (Groups start at 1.) Again we will define this as the last rule with [L].

So with these two rules http://example.com will invisibly redirect to index.php and http://example.com/page/409 will rewrite to index.php?p=page/409.

Unfortunatly, if you try to go directly to any other file or directory, it will not show up – Instead the path will get sent to index.php! We can use RewriteCond to add additional conditions to rectify this situation.

In a little bit of slightly backward behaviour, when Apache finds a rewrite rule it will go back to see if there were any conditions and check those before finally matching. So these conditions must appear before the rule.

The first parameter is what we test against, it refers to a special server variable, in this case REQUEST_FILENAME. The second parameter is our actual condition. -f asks “Is an existing file” and -d is directories.

Since we only want to continue if the match is not a file we can use a !, so our final patterns are !-f and !-d.

Our rewrite ruleset is nearly complete, but there’s one tiny thing, sometimes you may want to pass GET parameters around in the URL the same way you normally would, as it stands these will be ignored by Apache when rewriting. We can use a flag to append all extra request parameters to the URL – QSA.

Our final .htaccess file looks like this:

If anything about mod_rewrite is confusing it’s worth knowing that the documentation on it is actually not all that terrifying and can help you understand it much better.

Using PHP to get the requested URL

Our rewrite rule will pipe everything into index.php as a GET parameter called “p”. So everything in our script will need to be included through this.

To start our index.php file we need to define a list of matches.

$page_rules is an associative array pointing matched rules to special pages. You can keep your eventually pages whereever you like, for our example we’ll simply have a “pages/” subdirectory where these will go in.

We set $page_val to our “p” parameter if it was set, otherwise defaulting to index.

Next we go through all our rules and find if we’ve matched one with our parameter.

We now use preg_match to match against each of our rules.

For our index rule, the regular expression looks like /^index$/i. The forward slashes indicate the start and end delimeters of the regex, we can use extra flags to define special things, in this case we use the i flag which indicates that the match is case insensitive.

When we match a rule, we save the page name, then serve it later, or give a 404 message if not. (Your 404 message should be a lot better – this is a simple example after all.)

At this point our rewriting will work perfectly fine. There’s a few issues to work out first.
Small issues

First off, if you go to http://example.com/about the rule works, but if you go to http://example.com/about/ then nothing matches – our p variable has changed with that one character!
Luckily we can expand our regular expression to account for this. Question mark, ?, means “the previous character none or more times”. So /? would mean “forward slash or no forward slash”, except that as we know, forward slash means something in regular expressions. To escape characters in regex simply use the backslash character. The correct match is \/?.
Our new rule check looks like this:

You may be starting to think that regular expressions can start to look messy the more complicated they get – you’d be right, but that doesn’t discount how useful they can be. A regular expression can be broken down into smaller parts if necessary.
This leads on to our second problem, if any of our matches have a forward slash in, which they most likely will, you will get an error when hitting the regex because it would be unescaped. A quick str_replace will sort that.

Grouping
This is all well and good, but if you have a URL like our initial example of http://example.com/article/409 surely you’d have to make a new rule for each ID? Luckily we can leverage the power of regex to do that for us.
First we need to ammend our rules to match /article/ and any number. This is done like so:

[0-9] is a range, a single number from 0 to 9. And whereas ? means “zero or more times”, in this case we need to use plus, + which means “one or more times“. So we’re now matching any integer.
Great, our rule works! But it’s useless to us if we can’t get that parameter out. Well, first we need to define this as a group in our regex by using parentheses, the same way we did in our .htaccess rules before.

To get our groups out of our regular expression we need to use the third parameter of preg_match, in which we pass in an array that will then be filled with our groups.
Our foreach loop needs to look something like this:

Great, now if you tried to go to /article/409 and did a print_r of $matches, you’d get something like:
Array

(

[0] => article/43/

[1] => 43

)
The first entry is filled with the entire pattern, the second one is our requested ID – the group we defined in our match.
Notice that because we’ve defined any number, if you try to enter anything other than a number into the URL as an ID you will get given a 404 – This can act as a kind of first-defense validation for input variables.
This is almost the end of the article, however we can make this a little nicer to work with.
Named groups
Being able to refer to something like $_GET[’id’] is a convenience, there’s a name. $matches[1] is not so easy to dechyper – or even keep track of, what if we add another group that appears before it like a category name – the group number may shift.
Luckily with regex we can name our groups using the simple syntax of ? inside the start of the group. Our new rules can now look like this.

Now when you print_r $matches you get:
Array

(

[0] => article/43/

[id] => 43

[1] => 43

)
You can still refer to the group as $matches[1], but we’re also given the option of $matches[’id’].
Conclusion

Our final PHP script looks like this:

We’ve implemented friendly URLs in a simple way that is also very easy to work with in your script, we even have some basic validation that wouldn’t get relying on $_GET.

Related Posts

php-domain-information

How To Get Domain, Subdomain, TLD, CTLD & Path from URL In PHP

braintree-payment-gateway-integration-php

Integrating BrainTree Payment Gateway With PHP

How to create XML with PHP Mysql

How To Create XML With PHP MySQL

RBS WorldPay Integration With PHP

Integrating RBS WorldPay XML Direct Method with PHP