Understanding .htaccess III

Understanding .htaccess rewrite conditions


In the last article we got an introduction regular expressions and the Online htaccess Tester. In this one, we'll see how rewrite conditions work.


MODX logo

Rewrite Conditions

As we said in the first article in this series, the conditions and rules in an .htaccess file sit in sections. Each section has one or more rewrite conditions (all starting with RewriteCond) followed by a rewrite rule (starting with RewriteRule). If the rewrite conditions are met, the rewrite rule below them is applied. (It's possible to apply more than one rule after a set of conditions are met by using the [C] (chain) flag, but it's very seldom done.)

Here's a really simple rewrite condition we saw in the first article. It's used in the code to rewrite non-www URLS to have the www prefix:

RewriteCond %{HTTP_HOST} !^www\.bobsguides\.com [NC]

There are four parts to a rewrite condition. They are separated by spaces, which serve as delimiters in both conditions and rules. Here's the technical specification:

RewriteCond TestString CondPattern [flags]

The first one is literal. "RewriteCond" just identifies this line as a rewrite condition.

The second part is the test string %{HTTP_HOST}). It tells Apache where to look for the pattern. In this case we're telling it to look at the Server variable HTTP_HOST. All server variables in conditions take this form: %{variable name}.

The third part is what to actually look for. It's essentially a search string that's applied to the previous part. It's a regular expression (regex for short), but can be plain text, though that's fairly uncommon.

The fourth part ([NC]) is for .htaccess flags, and is optional. The flags contain a comma-separated list of uppercase flags enclosed by square brackets (with no spaces inside the brackets!). The [NC] flag tells Apache that we want a case-insensitive match. So the pattern would also match if the URL contains BobsGuides.com.

So, what, exactly, does our pattern (!^www\.bobsguides\.com) specify? The ! at the beginning negates the pattern. The ^ following it matches the beginning of a line. So any host name that does *not* match www.bobsguides.com will satisfy this rewrite condition and the following rule (that adds www. to the beginning) will apply.

If the URL is already www.bobsguides.com the condition will not be met (you don't want www.www.bobsguides.com). In this particular case, any URL other than bobsguides.com or www.bobsguides.com won't reach your site, but remember that you might have a rule above this one. If that rule converts bobsguides.com to bobsnewsite.com, the condition will be met and the typical rewrite rule for our example that converts the URL to www.bobsguides.com will apply. No one will ever get to bobsnewsite.com.

The order of your rewrite rules is important. If, for example, we had a rule that has ^www.bobsguides.com in the pattern, it should come after the rule that adds the "www." to the URL.

The only two flags you're likely to use in a rewrite condition are NC and OR. The second one means that the condition is met if either it or the previous condition is met. You can chain these by adding [OR] to a number of sequential conditions. In that case the following rule will be applied if any of the conditions are met.

Here are some rewrite condition examples from an earlier article in this series of the [OR] flag in a set of conditions used to prevent visits from potential hackers using specific user agents:

RewriteCond %{HTTP_USER_AGENT} JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} Sogou [OR]
RewriteCond %{HTTP_USER_AGENT} BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker

Notice that there's no [OR] on the final condition.

If you wanted to use both NC and OR (and you might in the example above), the flags would be set to [NC,OR]. Those two flags can also be written as nocase and ornext, but I've never seen it done. Putting a space after the comma is a common mistake. It leads to a 500 error as the rewrite engine tries to parse a condition with five parts since it interprets the space as a delimiter.

You might be wondering if there is an [AND] rule. There is, but it's unnecessary because it's the default.

Note that if the URL began with www.bobsguides.com, the "www" would be part of the %{HTTP_HOST} server variable. If a rule adds www. to the beginning of the URL, for any %{HTTP_HOST} in any rewrite condition after that rule the www. would be part of the %{HTTP_HOST} variable. Since there can be multiple passes through .htaccess, that could also be true of a rewrite condition *above* the rule that added the www. to the URL unless there is a rewrite condition that makes sure it won't apply.


Where to Look for a Match

You don't always want to use the host name for your pattern match. Here are the most common server variables used in rewrite conditions. The part after each server variable shows or describes what it contains based on the example URL:

https://bobsguides.com/assets/replace.php?searchstring=MODx&replacement=MODX


%{HTTP_HOST} bobsguides.com

%{REMOTE_ADDR} The IP address of the user (e.g., 127.0.0.1)

%{REQUEST_FILENAME} (on localhost) C:/xampp/htdocs/yoursite.com/assets/replace.php?searchstring="MODx"&replacement="MODX"

%{HTTPS} Will contain on of off (often not set on localhost)

%{REQUEST_SCHEME} Usually http or https

%{QUERY_STRING} searchstring="MODx"&replacement="MODX" (does not include the ?)

%{REQUEST_URI} /assets/replace.php (does not include query string)

%{SERVER_PROTOCOL} HTTP/1.1

%{REQUEST_METHOD} GET

%{SERVER_NAME} bobsguides.com

%{SERVER_PORT} 80

%{HTTP_USER_AGENT} Mozilla/5.0 ...


Friendly URLs Yet Again

In the previous articles, we saw this simple section for Friendly URLs:

# The Friendly URLs part
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

The -f and -d directives are special ones that don't involve a search string. The -f tests for the existence of the file requested. The -d tests to see if a directory with that name exists. The ! in front of them means that for the condition to apply, the requested document doesn't exist. The key to these two conditions is that MODX resources don't exist as files, so the condition will be met only if the requested document is not found on the server (as a file or a directory).

If there's a request for http://yoursite.com/somefile.html, for example, and there actually is a somefile.html file in the root of your site, the conditions won't be met and the rewrite rule won't be applied. The file will simply be served up without involving MODX at all.


Patterns without Regular Expressions

The pattern used in a rewrite condition doesn't have to be a regular expression. (Technically, that's not true, they're always treated as regular expressions but they can be simple text strings without any of the special regular expression characters.) Here's an example. Say you want to change all instances of find.php to search.php:

RewriteCond %{REQUEST_URI} find
RewriteRule find search

This will change http://yoursite.com/find.php into http://yoursite.com/find.php. This is an example of a pretty dangerous rule. If any URL on your site contains the string "find" it will be changed to "search," so for example, http:/yoursite.com/helpfinder.php will be changed to http:/yoursite.com/helpsearcher.php. The following version would be much safer:

RewriteCond %{REQUEST_URI} ^/find\.php$
RewriteRule find search

By adding the ^ (beginning of the line) and $ (end of the line), and making the search string contain the full filename, we can be sure the rule will only be applied to the file name we want to rewrite.


Capture Groups

We discussed capture groups in the previous article. They're surrounded by parentheses. Many people don't realize that you can capture sections of your search pattern in a rewrite condition for use in the rewrite rule that follows. If you're familiar with regular expressions and capture groups, you're used to using the variables $1, $2, etc. to use those groups. That's how they work when capture groups in rewrite rules. Using capture groups from rewrite conditions, though, is a little different. For those, we use the variables %1, %2, and so on. As a reminder, that matches the % sign used to denote server variables in rewrite conditions.

For example, here's another way to change http://bobsguides.com/find.php into http://bobsguides.com/search.php using a capture group:

RewriteCond %{REQUEST_URI} ^/find\.(php)$
RewriteRule find\.php search.%1

In the rewrite condition, we've captured the "php" in "find.php," and then used it in the rewrite rule as %1. It's kind of a stupid example, because our original version above is faster and easier to read, but I wanted a simple example. Later on, we'll see more practical examples.

The commonly used capture group variables go from %1 to %9, so you can have multiple capture groups in your rewrite condition. There is also a %0 variable which represents the entire pattern match (even with no parentheses), but I've never seen it used.

The capture groups must be in the condition just above the rewrite rule. Any capture groups in earlier rules will be ignored. For example, with the same URL http://yoursite.com/find.php: RewriteCond %{REQUEST_URI} ^/find\.(php)$ RewriteCond %{REQUEST_URI} ^/(find)\.php$ RewriteRule find\.php search.%2%1

If the capture groups in both rules worked, the result would be the same as before. Instead, this will result in the URL: http://yoursite.com/search.%2find. The find in the second condition is captured as %1. There is no %2, so that becomes a literal %2 in the final URL.

The technical term for the $n and %n variables we've been discussing is "backreferences," because they "refer back" to a capture group that precedes them.


Wrapping Up

In this article we looked at how rewrite conditions work and how they use rewrite rules. We saw some rewrite rules in the example, but we haven't explained them in detail yet. In the next article, we'll do that.


Coming Up

In the next article, we'll see how the rewrite rules in .htaccess work.



Looking for high-quality, MODX-friendly hosting? As of May 2016, Bob's Guides is hosted at A2 hosting. (More information in the box below.)



Comments (0)


Please login to comment.

  (Login)