.htaccess: A FED Guide to Redirects

I was involved in launching a new site recently — whoop! It’s something we do fairly often but not to the extent of being an everyday, run of the mill occurrence. It went smoothly (of course!) but in the run up I found myself preparing redirects for some weird looking urls that I didn’t fully understand. Not the urls, they’re pretty easy, but the redirects and, for that matter, the whole .htaccess thing. It’s been a bit of a murky zone for me up until now, so this article will be my attempt at (mini) redemption!

Some Context

So, what is an .htaccess file? An .htaccess file is a configuration file. That’s it! Tada! Time to pack it in, I’m done here. OK, maybe not. To fully explain the file, I guess we need to know what it’s configuring. From the top, it’s used by something called “httpd”, which incidentally has just turned 20 years old (happy belated birthday httpd)! Httpd is the software that serves your site/its pages. It’s a daemon process that’s set off by apachectl. Apachectl basically acts as the front-end for this process, an interface, if you will. Think of httpd as either very shy or very obnoxious – it won’t talk to you, but it will listen to its most esteemed friend apachectl who, fortunately for us, is more reasonable and actually will listen to us. So, we have to ask apachectl to tell httpd to do something we want it to do, although in my case I believe XAMPP does the talking. Whether XAMPP talks through apachectl or directly with httpd, I’m not sure – perhaps you could let me know in the comments if you know?

So, this httpd daemon process is the server that spins up when I open XAMPP (or WAMP or MAMP) and click “Start”. How does that relate to this .htaccess thing? Have faith, we’ll get there. It’s to do with the architecture of this server: it’s described as a “modular server”. By itself httpd is pretty basic, but we can add functionality (superpowers!) by giving it extra modules. If you’ve ever opened up an httpd.conf file (for me it’s in C:/xampp/apache/conf/), you will have seen a long list of things preceded by LoadModule .... each one of those is called a directive and each points to a module that httpd will load when it runs. A pretty simple idea, really! This httpd.conf file is (usually) where the main configuration settings live for your server – those LoadModule bits are only the beginning. The name and location of this file can be changed, if you want, but I’ve never found the need.

OK, we’re at configuration files now – getting closer to .htaccess. Tiny recap: “httpd” is a server daemon that’s spun up by “apachectl” and uses the settings we write in the main “httpd.conf” file to configure itself. Now although that httpd.conf file holds the main configuration settings, it’s not the only file that’s involved in configuring your server. If you scroll further down that file you’ll find a list of Include directives – they do exactly what you (probably) think they do… they include other configuration files. This collection of configuration files help to get your server set up to do what it’s meant to do while keeping everything nicely organised. If you poke around the directories and files that surround your httpd.conf file you’ll see some of those included files. I probably spend the most time in http-vhosts.conf but we’ll get to that later.

So, .htaccess? Almost! We’re pretty much at the point where we need them, but first I’ll explain the problem they solve. If you have the various things I’ve described so far set up, you have an apache server running (don’t worry you don’t actually need to set it all up yourself, it should already be there). That’s one server running on your one computer. Now although we only have one server running, I’d bet you would probably like the ability to run more than one website. I’d also bet that as you build more sites, at least one of them is going to need the server to do things a little differently, be it redirects or something more interesting. Now we have a need to set up some site specific configuration settings and that is what .htaccess gives us.

.htaccess

At last we have arrived! Here’s how the apache documentation introduces this infamous file: “.htaccess files provide a way to make configuration changes on a per-directory basis.” When I started out without knowing what all this was – that description went in one ear and out the other. Thanks Apache, the fog has lifted. But after figuring out it’s role and why it exists, this description makes complete sense! There’s some psycology in there that’s probably worth a look. So .htaccess lets us define different settings for each site we run on one computer. I’ll call them context specific settings and I apologies in advance of how often I’m probably going to write those three words.

Let’s look at setting some sites up. Imagine you have the files for two sites, both sitting within whichever directory you have set up as your server’s root. Personally I have a “projects” folder within my user profile, it sits beside my documents / downloads / desktop, that’s the root for my localhost. Within “projects” I have a folder for each site (actually, I have a “localWorking” folder in there for PSDs and the like and a “repo” folder for the site, which itself usually has a “www” folder for the site so all the dev tools don’t get packaged up but that’s all beside the point!) Now we can pop into each site (or “www”) folder an .htaccess with some configuration that will only apply to that directory and it’s children. It would be prudent to note here that the context is not specific to each website but rather to each directory. There are pros and cons to this – lets start with a pro!

Say one of your sites has an uploads directory that needs it’s very own server configuration that doesn’t apply anywhere else within that site. Into the uploads directory goes another .htaccess file – we can have context specific configurations within sites, pretty handy!

While we’re here, lets look at another pro. Imagine your hosting a bunch of sites for other people on your own server. It’s probably not a good idea to have all those people battling over a single configuration file. An .htaccess file in each site directory solves that nicely. Toes will no longer be stepped upon. From what I can tell, that’s pretty much exactly how our staging server is set up. Each site sits in it’s own directory with it’s own .htaccess file, assuming they don’t use something like web.config files which, for now, remain a mystery to me.

For our staging environment and my local development set up this is a great system! However, when you start looking towards deploying your sites to a live production environment there are a few issues that start popping up, it’s time for the cons.

Throughout the Apache documentation there are constant references to the idea that using .htaccess files should be avoided if you’re at all worried about performance. Something I didn’t realize, If the server configuration is set to allow them it’ll look for one in every directory and load them every time a resource is loaded. To compound this issue, if you have any rules within those .htaccess files it turns out they all have to be compiled every time a request hits them. Granted the process of finding and loading them is probably pretty quick, but on the larger projects with a lot of rules this could take a serious bite out your loading time. Ouch.

In some cases this is unavoidable, eg if you’re on a shared environment. But if you have your own server or are hosted on a private server there is another, more preferable way: scoped directives.


Scoped Directives a faster alternative to .htaccess

These do as .htaccess do: define context specific configurations but from within the main configuration files.
The nice thing about using this method is that the context rules here are compiled once and then cached. Plus without those pesky .htaccess files around, the server doesn’t need to do any hunting around for them! That’ll save you some load time and make your server’s life a little easier.

To define the context we have to wrap the relevant rules within context containers. There are a few directives we can use to do this and they are split between two main types: Filesystem Containers and Webspace Containers. The Filesystem is the view you may be more used to from your own file explorer, it’s how your OS sees the disk. This is the same way the .htaccess defines context: on a per directory basis. The Webspace view is that presented to the client (as in the browser that’s sending requests to your server). With this second type, we are no longer constrained to matching our site structure to our directory structure – freedom! Lets dig in a bit:


There are two pairs of Filesystem Container directives: Directory & DirectoryMatch, Files & FilesMatch:

  • <Directory "/"> Applies to all: Require all denied would be good here, then allow only where we need to. </Directory>
  • <Directory "/www/uploads/2016"> is more specific than “/” so this overrides the above </Directory>
  • <Directory ~ "^/www/uploads/[0-9]{4}"> (perl like) Regular expressions are evaluated next, so this overrides the above </Directory>
  • <DirectoryMatch "^/www/uploads/[0-9]{4}"> Is another way of handling regular expressions. What’s the difference? Take a guess and I’ll let you know further on. </DirectoryMatch>
  • <Files "index.html"></Files>
  • <Files ~ "*.jpe?g/"></Files>
  • <FilesMatch ".+\.(gif|jpe?g|png)$"></FilesMatch>

In a nice and intuitive touch, File containers can be placed within Directory containers to further scope their effects:

<Directory "/uploads">
    <Files ~ "*.jpe?g/">
        #Rules here will apply to all jpg/jpegs in the uploads directory
        #unless we define more specific contexts elsewhere
    </Files>
</Directory>

Webspace context configuration can be set with the Location and LocationMatch directives by passing them the relevant URLs for your rules.

<Location "/urlsection"> 
    Rules here apply to these urls:
        /urlsection 
        /urlsection/ 
        /private/anything.html
</Location>

<Location "/urlsection/">
    Rules here apply to 
        /urlsection/
        /urlsection/anything.html
    but not
        /urlsection
</Location>

<LocationMatch "^/urlsection">
    Rules here apply to 
        /urlsection
        /other/urlsection
</LocationMatch>

Like the “Directory” directive, “Location” allows Regex with the addition of ~. So now we get to the difference between the two… nothing! Well, almost nothing, here’s a note describing the difference I found in the docs:

The directive <LocationMatch> behaves identical to the regex version of <Location>, and is preferred, for the simple reason that ~ is hard to distinguish from – in many fonts.

So for anyone looking up <LocationMatch> vs <Location ~>, <DirectoryMatch> vs <Directory ~>, or <FilesMatch> vs <Files ~>… they’re functionally identical!


The last context container I’ll look at in this article is the VirtualHost directive. This is the guy that lets us run more than one site locally – each with their own local url. This directive gives us two ways to differentiate between sites: IP and Name-based. The latter is easier and it’s the one that (I believe) pretty much everyone here uses to set up sites locally. An example:

<VirtualHost *>
    DocumentRoot "C:/path/to/site one/root"
    ServerName site-one.local
</VirtualHost>

<VirtualHost *>
    DocumentRoot "C:/path/to/site two/root"
    ServerName site-two.local
</VirtualHost>

The * within the VirtualHost directive is where you would put the IP if you were going down that route. But for name-based Virtual Hosts the * is a catch all for any IP, which we need as the IP is resolved first then the “ServerName” is checked only for the VirtualHosts that match. Finally the DocumentRoot gives us the minimal requirement for this directive… almost. The last thing to set up is the mapping from your local name to an IP. For me this is in C:/Windows/System32/drivers/etc/hosts. Within that file I’d put this (the spaces don’t matter, it’s just how I organize things in there):

127.0.0.1          site-one.local
127.0.0.1          site-two.local

That’s the set up – the final step is to restart your apache server. If you remember from before, these rules are compiled once and cached so they won’t do anything until your server restarts. With those two now running, we can place any of the previously mentioned context containers inside the VirtualHost directive. Voila! Scoped Directives! Now before we start actually doing things other than organization it would probably be a good idea to look at which of these directives take precedence when we start mixing them all together, otherwise we’d be heading into the land of unintended consequences…


Scoped Rule Cascade

Each directive has something like a context specificity. Forgive me if those aren’t the correct words but I’m currently a Front End guy so I’ve got CSS on the brain! Thinking of the rules as applying like the cascade just makes sense to me. Here’s a list in order from least specific to most specific. As you move down, rules from preceding directives are overruled.

  1. <Directory "one">
  2. <Directory "one/two">
  3. .htaccess
  4. <DirectoryMatch> and <Directory ~>
  5. <Files> and <FilesMatch>
  6. <Location> and <LocationMatch>
  7. <VirtualHost> 1 to 6 repeat inside this.

For multiple directives with matching contexts, eg <Directory "one"> followed by another <Directory "one"> they are evaluated in the order they appear within the config files. The second occurrence will take precedence when it’s contents are merged in with the first. Checkout the docs for some examples of this rule cascade. It’s not a crazily complex thing but I could see things getting confusing – especially when you consider how far up that list .htaccess is.


So that article took a direction I wasn’t quite anticipating. The contents of these things are still a little murky but the role of .htaccess files is clear and the performance stuff – I had no idea! If only for that knowledge, I’m glad I took a dive into this area The next step in furthering my personal understanding would be to get another article together on the modules we use within the various Scoped Directives. Make that just the most common modules – I just had a quick peek at this: http://httpd.apache.org/docs/2.4/mod/ That’s a pretty chunky list! As much as I enjoy putting my own spin on dry documentation, I think going through that lot might be a bit of a stretch. If you know of any resources out there that make this stuff even a little more colorful – please, post them in the comments – or on Twitter!

« Prev Article
Next Article »