Understanding .htaccess I

Introduction to htaccess rules and conditions


The is the first in a series of articles about .htaccess files. In this article we'll look at how the rules in your .htaccess file are processed. We'll also discuss a great online tool that not only lets you test your .htaccess file, but gives details about how each line is processed.


MODX logo

Orientation

This series of articles is by no means an exhaustive explanation of how .htaccess works. It's intended to get you off the ground in understanding what the (sometimes confusing) lines in .htaccess mean, and how they are processed. It also introduces a great online tool that will let you test your .htaccess conditions and rules and can help you learn how the process works. For a long time, I wrote .htaccess conditions and rules without really understanding them. They worked (often after a lot of trial and error), but not because I understood them.

I've been frustrated for a long time with the pages that Google serves up when you search for information on .htaccess and mod_rewrite. They all claim to explain .htaccess files, but they don't really. For the most part, they don't explain the structure of rules and conditions, and they assume that you already understand regular expressions. Most of them consist of examples of how to do things you might want to do in .htaccess, but don't explain how the examples actually work, and often don't contain before and after URL examples. I have yet to see a page that even explains that a space character serves as a delimiter in both conditions and rules. I'm hoping that I can make things a little clearer for beginners in this series of articles.


First, the Tool

The Online htaccess Tester offered by MadeWithLove in Belgium, is really handy for testing all or part of your .htaccess file to make sure it's doing what you want it to. Writing rewrite rules for your .htaccess file can be a challenge. It's easy to add a new rule that will make rules below it fail even though they worked before you added the new section. The tester can help you make sure you haven't broken anything with your new .htaccess code.

I encourage you to use the htaccess Tester to play with the code in this article. Just paste the .htaccess code in, enter a URL, and press the "Test" button. You'll see the URL that comes out the other side of the process (in the "Output url" box). Better yet, playing with the Tester can help you understand .htaccess concepts because the Tester displays diagnostic lines explaining which conditions were met, or not met, and which rules were actually applied.

The Tester has some limitations, but it's still really helpful because of the line-by-line analysis of the processing. It doesn't handle some important .htaccess concepts like (at this writing) ${THE_REQUEST} and %{REQUEST_FILENAME}. Most important, while the actual "mod_rewrite" makes multiple passes through the .htaccess file, the Tester (at this writing) only performs the first pass. In spite of that, it's still really helpful for testing conditions and rules and understanding how they work.


What Does the .htaccess File Do?

As you probably know, when a request comes in to your site for a page, the server takes the submitted URL and applies the transformation rules in .htaccess to it before trying to serve up the page. The MODX friendly URL system, for example, works because of one section in the default MODX .htaccess file (more on this in a bit). The .htaccess process depends on the Apache "mod_rewrite" module. (If you're using a non-Apache server, such as nginx, this article will be less helpful. The principles are the same, but the structure and syntax of the rewrite file is different.)

At the top of the .htaccess file, you'll typically see some comments, which always begin with the "#" symbol. All lines starting with "#" are ignored.

While writing this series of articles, I used an .htaccess file to test things. I'd put in a rewrite condition and a rule, test it, then comment it out. After a while, everything stopped working. The resulting URLS were very weird and I couldn't make many of them work at all, even though they were quite simple. Once I removed the commented-out conditions and rules, things started working again.

I still can't believe this, because any decent parser will strip out all comments before doing any parsing. Still, I've seen suggestions on the web to only use only alphanumeric strings plus hyphens and commas in .htaccess comments and I'll definitely follow that advice from now on. I advise you to do the same. It can't hurt.

Below the comments at the top of an .htaccess file, you'll almost always see these two lines:

RewriteEngine On
RewriteBase /

The first one turns on the mod_rewrite module, which may or may not be on already. The modx_rewrite module is off by default in Apache, but it's often turned on in the Apache configuration file or the Apache virtual hosts file. The second one tells Apache where the root of the site is. On a typical live server, the RewriteBase doesn't change if the site is in a subdirectory because you usually point the domain at that subdirectory. On localhost installs, if you don't create a virtual host pointed at the subdirectory, you'll have to add the subdirectory after the slash, like this (assuming that the local site is in the /modx directory under the server root (e.g., htdocs):

RewriteBase /modx/

Make sure your RewriteBase line ends in a slash.


Conditions and Rules

Below the two lines above, you'll usually see repeated sections where each section has one or more conditions followed by a rewrite rule. The conditions all start with "RewriteCond" and the rules all start with "RewriteRule". The .htaccess file may also have other sections with things like PHP directives, permissions, and caching directives. We'll look at some of those later in this series of articles. For now, we're only concerned with the conditions and rules.

We'll discuss the conditions and rules in upcoming articles in this series, but for now try this in the htaccess Tester:

Paste this URL into the top input of the form:

http://bobsguides.com/home.html

Paste this .htaccess code from the Friendly URLs section of the MODX .htaccess file into the big text box just below the URL:

# The Friendly URLs part
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

The usual first two lines ("RewriteEngine On" and "RewriteBase /") are unnecessary here because the Tester sets them by default.

Now click on the "Test" button.

Scroll down to the "Output url" section. You should see the URL transformed into this:

http://bobsguides.com/index.php?q=home.html

Well discuss how this happens in future articles. This code is the heart of the the MODX Friendly URLs system. That second URL directs the server to the "index.php" in the MODX root directory, with the file's alias as the "q" parameter. The "index.php" file reads the alias, looks for a resource with that alias, and serves it up, or redirects to the error (page-not-found) page if it can't find the resource.

Below the final URL in the htaccess Tester, you can see the debugging information telling you that both of the two rewrite conditions were met, and the rewrite rule was applied to produce the resulting URL.


The Parsing Process

When the request comes in to the server, Apache uses the mod_rewrite module to load the .htaccess file into memory (where it stays until the request has been fully handled). The mod_rewrite module goes through the file, line by line, and checks to see if any rewrite conditions are matched in the URL. If it finds a matching condition, it applies the rule immediately below it to transform the URL. If there are multiple conditions (on multiple lines), all of them must be met unless they contain the [OR] flag at the end of the condition. The sections in .htaccess should be thought of as one or more rewrite conditions, followed by a single rewrite rule. (You can have multiple rules apply by adding a flag, but this is rarely done, and it can make things difficult to debug.)

This is an important point that many people miss. The mod_rewrite parser makes *multiple passes* through .htaccess. If, at the end of a pass, any rewrite rule has changed the URL, the parser starts again at the top with the new URL. This continues until a complete pass is made without any rules being applied.

If a rule is followed by the [L] (Last) flag, and the rule is applied, the parser goes immediately to the top, skipping any conditions and rules below that point *on that pass*. The conditions and rules below the line with the [L] flag will be checked on future passes (assuming that the conditions for the rule with the [L] flag are no longer met). The looping also stops if, at the end of a pass, the rules have changed the URL to point to another domain.

Because of the multiple passes, it's important to make sure that rules won't apply again after they've been applied. If, for example, you write a rule that changes "test" to "testing" in the URL, after a few passes, you'll have "testinginginging" as the rule continues to find the word "test" and tacks "ing" onto the end of it. Eventually, there will be a 500 error for too many redirects. Your rewrite condition would have to check for the word "test" *not* followed by "ing".


Coming Up

In the next article, we'll look at "regular expressions," which are used extensively in both .htaccess conditions and rules. In the articles after that, we'll look at rewrite conditions, and rewrite rules and some other things that can go in .htaccess. Finally, we'll see some practical .htaccess examples.



Looking for high-quality, MODX-friendly hosting? As of May 2016, Bob's Guides is hosted at A2 hosting. (More information in the box below.)



Comments (0)


Please login to comment.

  (Login)