Understanding .htaccess V

Some useful .htaccess rules and how they work


In the last article we looked at rewrite rules and how they work. In this one, we'll see a list of useful rewrite rules with some explanations of how they work. Remember the Online htaccess Tester that you can use to test your rewrite conditions and rules.


MODX logo

A Few Things to Know

Remember that in order to use rewrite conditions and rules, you must include these lines at the top of your .htaccess file:

RewriteEngine on
RewriteBase /

All the examples below assume that these two lines are above them.

Another thing to think about only applies if you have access to the Apache main configuration file on your server, httpd.conf. This generally won't be the case if you're on a shared server.

If you do have access to that file, anything that can be done with .htaccess can be done more efficiently in httpd.conf. That's because the server will cache the configuration file. If you use .htaccess, it will be read and processed on every page load. See this article for more details.

Using the configuration file for rewrites and server directives is bound to be faster and more efficient. The only down side to using the configuration file is that the server must be stopped and restarted in order for any changes to take effect.

Finally, remember that the rewrite engine will always put the base URL in front of the URL you create in the third part of a rewrite rule, unless you put a literal http:/>/ or https:// at the front yourself.

Now, lets look at some useful rewrite operations. (Some of these appeared in earlier articles in this series.)


SSL Rewrites (https)

This rewrite can take a number of different forms. This is the one I've seen most often:

RewriteCond %{HTTPS} !=on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

Notice that the rule above doesn't use the capture group ($1). It simply produces https:// followed by the host (e.g., yoursite.com), followed by the part of the URL after the domain name (%{REQUEST_URI}). Any query string will be included.


Require Subdomain

This section redirects all requests to a subdomain at your site.

## REQUIRE SUBDOMAIN
RewriteCond %{HTTP_HOST} !^$
RewriteCond %{HTTP_HOST} !^subdomain\.yoursite\.com$ [NC]
RewriteRule ^(.*)$ http://subdomain.yoursite.com/$1 [L,R=301]

If the HTTP_HOST is not empty, and is not already aimed at the subdomain, the code above rewrites the the URL to the same URL (including any query string) at the subdomain.


Rewrite Directory Name for SEO

Say you have a directories called /n/ and /u/ that list new and used cars. This code will rewrite /n/ to /new/ and /u/ to /used/ which will make it easier for search engines to find your pages.

If the directory to be rewritten is a top level directory (e.g., yoursite.com/n/):

RewriteRule ^n/(.*)$ /new/$1 [L,R=301]
RewriteRule ^u/(.*)$ /used/$1 [L,R=301]

If it's a top-level directory, the first slash in /n/ will not be part of the search string so we can't have it in our pattern. In the final URL, we add the slash literally in the third part, followed by new/>, followed by the rest of the original URL (the only capture group — $1).

If the directory to be rewritten is *not* a top-level directory (e.g., yoursite.com/cars/n/

RewriteRule ^(.*)/n/(.*)$ $1/new/$2 [L,R=301]
RewriteRule ^(.*)/u/(.*)$ $1/used/$2 [L,R=301]

If it's not a top-level directory, both slashes in /n/ will be part of the search string. We capture any subdirectories that come before the /n/ in the first capture group, $1. ($1 may be empty, because the * means zero or more.) Then, we capture anything after the /n/ with the second captures group, $2.

In the third part, we create the URL with $1 followed by /new/ or /used/, followed by the rest of the original URL, $2.

Notice that in all of the versions above, we don't capture the n or the u because we're not using them in the final URL.


Rejecting Bad User Agents

There may be certain user agents (browsers or browser-like tools) that you don't want to allow to access your site. Here's a section of code to do that:

RewriteCond %{HTTP_USER_AGENT} BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [NC]
RewriteRule .* - [F,L]

Notice the OR in the conditions above. The rule at the end will execute if any of those user agents appear in the %{HTTP_USER_AGENT} server variable. Notice also that there's no OR on the last one. This section would execute faster if you put a beginning of line symbol at the beginning of the third part (e.g., ^BlackWidow), but that means a potential hacker could just put something in front of the user agent and avoid the rule.

The only part of the rewrite rule that actually does anything is the flags. L means to skip any conditions and rules below this section if there's a match. The F means to return a 403 - Forbidden code, blocking the user from accessing anything at your site. The parts ahead of the flags are just there to follow the formatting syntax of a rewrite rule.

The code above could also be written with a single condition, like this:

RewriteCond %{HTTP_USER_AGENT} (BlackWidow|ChinaClaw|LeechFTP|WebStripper|WebWhacker) [NC]
RewriteRule .* - [F,L]

There are many bad user agents (this is only a partial list), so it's usually easier to use the first version and put them in alphabetical order so you can add new ones after checking to make sure they're not already there.


Redirect to index.php

RewriteCond %{REQUEST_FILENAME} !-f  # Existing File
RewriteCond %{REQUEST_FILENAME} !-d  # Existing Directory
RewriteRule . /index.php [L]

Whenever the requested file doesn't exist, this code will redirect the user to index.php. Any subdirectories will be lost, but any query string will be preserved.

In other words,

http://yoursite.com/hello/goodbye/test.php?x=1

will become:

http://yoursite.com/index.php?x=1

A common case with content management systems is to redirect to index.php, while preserving the information in the URL:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

In this version, the part after the base URL is placed in a new query string with q as the variable. The original query string appears at the end because of the QSA (Query String Append) flag. In this case,

http://yoursite.com/hello/goodbye/test.php?x=1

Will become,

http://yoursite.com/index.php?q=hello/goodbye/test.php&x=1

Remove www from URLS

# Rewrite www.domain.com -> domain.com
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

Notice that this version has two capture groups surrounded by parentheses.

The first capture group is in the second rewrite condition. It captures the part of the host name that comes after the www.. The literal dot is escaped by a backslash. The NC flag gives us a case-insensitive match. This capture group is available in the rewrite rule as %1.

The second capture group is in the rewrite rule. It captures everything that follows the base URL including the query string. It's available in the final URL as $1. In the third part of the rule, we start with http:// add the $1 (base URL) followed by /%2 (a literal slash followed by the rest of the URL we captured in the second rewrite condition.

So the URL,

http://www.yoursite.com/hello/test.php?x=1

will become,

http://yoursite.com/hello/test.php?x=1

Note that this rule hard-codes the request scheme (http or https) into the final URL. Remember to use the correct version for your site and to change it if the site switches from http to https. As far as I know, there's no reliable way around this.


Add www to URLS

RewriteCond %{HTTP_HOST} !^www\.yoursite\.com$ [NC]
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R=301,L]

The code above checks to make sure that the host name doesn't already start with www. with a case insensitive ([NC] match. the two literal dots in the rewrite condition are escaped. The rewrite rule captures the part after the base URL as $1, writes the base URL and adds a literal / followed by the $1.

As with the rule above to remove the www, the request scheme http or https is hard-coded here, so make sure to use whichever one is appropriate for your site.


Rewrite Query String Elements to Directories

A form that lets users select the make, model, style, and year a car, then uses the "get" method to post to itself. You want to convert each part of the query string to a directory name, then forward the user to the index.php file of that path. Here's the rewrite code:

RewriteCond %{QUERY_STRING} (.*)make=([^&]+)&model=([^&]+)&style=([^$]+)&year=([^$]+)$
RewriteRule (.*) %2/%3/%4/%5/index.php? [L]

There are five capture groups in the rewrite condition, one for the part of the query string before make and four the values of each part of the query string. Notice that we don't use the first capture group in the final URL.

This rewrites,

http://bobsguides.com/parts-page?make=ford&model=mustang&style=convertible&year=2003

to,

http://bobsguides.com/ford/mustang/convertible/2003/index.php

Here's the same code with the first group converted to a "non-capturing" group. The ?: at the beginning of the capture group tells Apache not to capture it in a variable. Since the first group is not captured, we have to renumber the remaining group variables in our rewrite rule:

RewriteCond %{QUERY_STRING} (?:.*)make=([^&]+)&model=([^&]+)&style=([^$]+)&year=([^$]+)$
RewriteRule (.*) %1/%2/%3/%4/index.php? [L]

Stop Hotlinking of Images

This code will prevent people with other sites from displaying images from your site or linking to your files:

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http(s)*://yoursite\.com.*$ [NC]
RewriteRule \.(gif|jpg|jpeg|bmp|png|zip|rar|mp3|flv|xml|php|css|js|pdf)$ - [F]

If the HTTP_REFERER is not empty and is not your own site, reply to any request for one of the listed file extensions with with a 403 - Forbidden code.


Limit Request Types

You can reject requests that don't use acceptable request types:

RewriteCond %{REQUEST_METHOD} !^(GET|HEAD|POST)$ [NC]
RewriteRule .? - [F,NS,L]

If the request method is not on the list, the reply is 403 - Forbidden. The NS flag tells Apache not to use this rule on subrequests, such as a check for the name of the default index file or any server-side include.


Coming Up

In this article we looked at how rewrite conditions work and how they use rewrite rules. We also saw a number of examples. In the next article, we'll some things you can do in .htaccess that don't involve rewrite conditions or rules.



Looking for high-quality, MODX-friendly hosting? As of May 2016, Bob's Guides is hosted at A2 hosting. (More information in the box below.)



Comments (0)


Please login to comment.

  (Login)