Understanding .htaccess IV

Understanding .htaccess rewrite rules


In the last article we looked at rewrite conditions that have to be met for rules to apply. In this one, we'll look at the rewrite rules themselves. Remember the Online htaccess Tester that you can use to test your rewrite conditions and rules.


MODX logo

Review

As we saw in the previous articles in this series, the conditions and rules in an .htaccess file sit in sections. Each section has one or more rewrite conditions (all starting with RewriteCond) followed by a rewrite rule (starting with RewriteRule). If the rewrite conditions are met, the rewrite rule below them is applied.


Rewrite Rules

Like rewrite conditions, rewrite rules have four parts separated by spaces with the fourth part optional. Unlike rewrite conditions, the second and third parts play a different role. The first part is always RewriteRule, which identifies the line as a rewrite rule. The second part is a regular expression (remember that in a rewrite *condition*, it's the third part that's a regular expression). The third part of a rewrite rule is either what you want to replace the value in the second part with, or (more often) an expression of the final URL you want.

Rewrite rules are not required to have rewrite conditions immediately above them. Suppose, for example, that you rename a directory from /customers/ to /clients/. You want to redirect users with the old URL to the new one. This simple rule would do it.

RewriteRule customers/ clients/ [L,R=301]

The rule above will only work if customers is a top-level directory.

The flags at the end tell Apache to skip any following rules [L] and to redirect the user with a 301 - Moved Permanently header [R=301]. We could also have used [L,R]. This would redirect the user, but without telling the browser that the change is permanent. The header will be 302 - Moved Temporarily. The 302 redirect also tells search engines not to index the new URL, so it's generally a bad practice from an SEO standpoint if the change is permanent.

Let's take another look at the MODX Friendly URLs section of .htaccess:

# The Friendly URLs part
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

The code above is a more common version of a rewrite rule. The two conditions require that the %{REQUEST_FILENAME}, which is the fill path to the file or directory is not found as a file (!-f), and not found as a directory (!-d). This means that the rule will not apply if the requested file or directory actually exists on the server.

The second part of the rule (^(.*)$) is a regular expression that always takes the part of the URL that comes after your base URL (e.g., http://yoursite.com) as its search string (not including any query string at the end). This expression matches that entire part. It looks for the beginning of it (^), followed by zero or more of any character, followed by the end of it ($). The part between the beginning and end is in parentheses so it will be captured for use as the variable $1 in the third part of the rule.

The third part of the rule (index.php?q=$1), is the URL we want. Apache always adds the base URL to the beginning of it, unless you provide an alternative by starting the third part with a server protocol like http://, in which case it will use that for the beginning of the URL (more on this in just a bit).

The flags at the end include [L] to skip any following rules and [QSA], which stands for "query string append.". This flag tells Apache to append any query string to the end of the URL. Otherwise, it would be lost because it's not captured by the second part of the rule.

Notice that the dot in the third part of the rule does not have to be escaped (and shouldn't be), because that part is not a regular expression. It's a literal string that's used as is except that the $1 is replaced by the capture group from the second part. If we did escape the dot with a backslash, a literal backslash would appear in the final URL.


Examples

Lets look at some different versions of the rewrite rule in the example above and see the URLs they produce with this URL, which is the Manager URL for editing the resource with the ID 12:

http://bobsguides.com/manager/?a=resource/update&id=7042

First, the original version:

RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
http://bobsguides.com/index.php?q=manager/&a=resource/update&id=12

Apache starts the URL with http://bobsguides.com/, then adds the literal index.php?q=, followed by the capture group from the second part of the rule manager/&a=resource/update, then, because of the QSA flag, it adds the query string from the original URL, &id=12.

Notice that the slash at the end of the base URL is not captured by the regular expression, otherwise we'd have q=/manager ... in the final URL.

RewriteRule ^(.*)$ /index.php?q=$1 [L,QSA]
http://bobsguides.com/index.php?q=manager/&a=resource/update&id=12

In the version above, we've added a / in front of index.php, but the final URL is the same. Apache is smart enough not to leave a double slash at the end of the base URL.

RewriteRule ^(.*)$ index.php?q=$1 [L]
http://bobsguides.com/index.php?q=manager/

We've left off the QSA flag, so the query string has been lost.

RewriteRule ^(.*)$ http://bobsnewsite.com/index.php?q=$1 [L,QSA]
http://bobsnewsite.com/index.php?q=manager/&a=resource/update&id=12

Because we've provided a new base URL in the third part, that's used instead of the original one, but everything else stays the same.

RewriteRule ^(.*)$ bobsnewsite.com/index.php?q=$1 [L,QSA]
http://bobsguides.com/bobsnewsite.com/index.php?q=manager/&a=resource/update&id=12

Oops, we left out the server protocol in the third part. Now, the new domain name is just appended to the old one.

RewriteRule (.*) index.php?q=$1 [L,QSA]
http://bobsguides.com/index.php?q=manager/&a=resource/update&id=12

Here, we've left off the beginning and end of line symbols (^ and $. It doesn't matter because they're not really necessary. The .* matches all the characters in the search string.

RewriteRule (.+) index.php?q=$1 [L,QSA]
http://bobsguides.com/index.php?q=manager/&a=resource/update&id=12

This works too, because .+ matches one or more characters, which will still capture the whole search string. The dot matches any character and the + says we want to match one or more of the element ahead of it.

RewriteRule (^.+$) index.php?q=$1 [L,QSA]
http://bobsguides.com/index.php?q=manager/&a=resource/update&id=12

In this version, we've put the beginning and end of line characters (^ and $) inside the capture group. It still works. Those are never captured because they're not really characters. They just tell the search where to start and stop looking for matches.

There are cases where the beginning and end of line characters matter a lot. They're just not that critical in this example.


A New Example

In the last article, we discussed a rule to add www. to the beginning of every URL. Let's look at a simple section of the MODX .htaccess file that does the opposite. This one removes the www. from the beginning of a URL. This example shows the use of capture groups in both a condition and a rule:

# Rewrite www.domain.com -> domain.com
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^www.(.*)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

The first rewrite condition needs a little explanation. The dot matches any character, so it's just making sure that the host variable is not empty. If %{HTTP_HOST} is empty, the rule will not apply.

The second rewrite condition matches www, but only when it appears at the beginning of the %{HTTP_HOST} server variable (e.g., www.bobsguides.com). We wouldn't want our rule to be applied if the domain name were bobsWWWsite.com. Notice the capture group after www.. This will capture everything in the %{HTTP_HOST} beyond the www. (e.b., bobsguides.com) for use in the rewrite rule as %1. The [NC] flag says we want a case-insensitive match so it will also match WWW..

The rewrite rule also has a capture group. It captures all of the URL after the base URL. On the right, the rule starts with a server protocol http://, followed by the capture group (%1) from the rewrite condition (bobsguides.com) followed by the rest of the URL captured in the second part of the rule ($1). This will also capture any query string.

So this URL:

http://www.bobsguides.com/manager/?a=resource/update&id=12

becomes:

http://bobsguides.com/manager/?a=resource/update&id=12

This example above comes from the current MODX ht.access example file. It has a minor mistake in it. The dot after www in the second rewrite condition is not escaped. It matches any character, so it will match the dot in the URL, but it will also match this URL:

http://wwwXbobsguides.com/manager/?a=resource/update&id=12

The mistake is fairly harmless because the character after the www will always be a dot and even if it's not, the rule will still produce the correct URL. Here's another version that takes a different approach, but is less portable. It's from an earlier version of MODX:

# Rewrite www.domain.com -> domain.com
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^yoursite\.com [NC]
RewriteRule (.*) http://yoursite.com/$1 [R=301,L]

This version will rewrite any URL where the domain name does not (!) start with yoursite.com to the non-www version of the URL.

This version is inferior to the one we saw above. First, it requires that you hard-code your domain name into your .htaccess file, making it fail catastrophically if, for example, you transfer your site to a localhost install and don't change those two lines.

Second, you would also have to remember to rewrite the lines if you changed your site to use SSL, since this version would change your URL from https back to http if it appeared below any https rewrite.


Multiple Capture Groups

Suppose that you have a site that sells auto accessories. A form on a page called parts-page lets the user enter the make, model, style, and year of a car. The form sends the values as a query string in the URL to that same page, like this:

http://bobsguides.com/parts-page?make=ford&model=mustang&style=convertible&year=2003

Now, suppose that you want to send the user to this URL, which would show the parts available for that particular car:

http://bobsguides.com/ford/mustang/convertible/2003/index.php

Here's the rule to do that:

RewriteCond %{QUERY_STRING} (.*)make=([^&]+)&model=([^&]+)&style=([^$]+)&year=([^$]+)$
RewriteRule (.*) %2/%3/%4/%5/index.php? [L]

Notice that we have five capture groups in the rewrite condition that each capture a part of the query string. The first one captures anything in the query string that comes before "make." It's not really necessary because the "parts-page" and the question mark will not be part of the query string, but it can't hurt. The rest of the capture groups capture just the values of the make, model, style, and year.

In the rewrite rule, we just enter capture groups 2 through 5, separated by literal slashes, then tack index.php on the end. as usual, the rewrite engine puts the base URL on the beginning. Notice that we're not using the capture group in the rewrite rule ($1) at all. Something has to go there to meet the formatting requirements for a rewrite rule, but a single dot (matching any single character) with no parentheses would work just as well:

RewriteRule . %2/%3/%4/%5/index.php? [L]

Since we don't actually use the first capture group in our rewrite condition, we could make this regex a little more efficient by telling Apache that that first group is a "non-capturing group", like this: (?:.*). The addition of ?: at the beginning tells the rewrite engine to treat that group as usual, but not to bother capturing it in a variable. Since the first group is not captured, we'd have to change our variables to 1 through 4, instead of 2 through 5, so the new code would look like this:

RewriteCond %{QUERY_STRING} (?:.*)make=([^&]+)&model=([^&]+)&style=([^$]+)&year=([^$]+)$
RewriteRule (.*) %1/%2/%3/%4/index.php? [L]

Server Variables in Rewrite Rules

We've seen the capture group variables used in a rewrite rule, but the standard server variables are also available. Their use is very rare, but here's an example that will rewrite *all* requests to https:

## REQUIRE SSL (works even if MODX_SSL is not loaded)
RewriteCond %{HTTPS} !=on [NC]
RewriteRule ^.*$ https://%{SERVER_NAME}%{REQUEST_URI} [R,L]

If the initial URL is already https, the server variable %{HTTPS} will be set to on. The rewrite condition will not be met and the rule will not apply. In all other cases, %{HTTPS} will be set to off. The condition will be met because !=on means not equal to "on" and the rule will be applied to produce the same URL, but with the https protocol.


f

Rewrite Rule Flags

There are many flags that can be put in the square brackets at the end of a rewrite rule. As with the flags for rewrite conditions, you can use multiple flags separated by a comma (with no spaces!).

You can see the full list of flags here. The list below contains the most commonly used flags and their effect. When in doubt, check the link for details and possible edge cases for a specific flag. Many of the flags also have a text version (though those are seldom used). In the list below, then F|forbidden means you can use either F or forbidden to get the same effect.

  • END - Terminate the rewrite process completely; does not apply to rules with the R flag
  • F|forbidden - Return a 403 - Forbidden code
  • G|gone - Return a 401 - Gone response
  • L|last - Skip any conditions or rules below this one in this pass
  • NC|nocase - Use a case-insensitive match for the regex
  • QSA|qsappend - Append the query string to the URL
  • QSD|qsdiscard - Remove the query string - available only in Apache 2.4.0+;
  • ? - Not a flag; putting this at the end of the third part removes the query string in < Apache 2.4.0
  • R|redirect - Usually used as R=### where ### is the code used (e.g., 401)
  • S|skip - If this rule applies, skip the next one; S=n skips the next n rules

Coming Up

In this article we looked at how rewrite conditions work and how they use rewrite rules. We also saw a number of examples. In the next article, we'll see a list of practical examples of rewrite rules.



Looking for high-quality, MODX-friendly hosting? As of May 2016, Bob's Guides is hosted at A2 hosting. (More information in the box below.)



Comments (0)


Please login to comment.

  (Login)