LogPageNotFound Plugin Tutorial

If you use this extra and like it, please consider donating. The suggested donation for this extra is $5.00, but any amount you want to give is fine (really). For a one-time donation of $50.00 you can use all of my non-premium extras with a clear conscience.

Overview

LogPageNotFound logs all requests that result in a 404 (page not found) error. You will almost certainly find that there are links out there for pages you've changed the alias of and possibly links at your own site that no longer work. Using the information in the log, you can create either weblinks on the site or redirect rules in your .htaccess file to send visitors to the right place.

LogPageNotFound is fully compatible with MODX 3.

Here are some examples of rewrite rules for redirecting MODX site page requests in .htaccess

RewriteCond %{QUERY_STRING}  ^$
RewriteRule ^MODX-revolution\.html$ /modx-revolution.html? [R=301,NE,L]
RewriteRule ^Modx\.html$ /modx.html? [R=301,NE,L]
RewriteRule ^MODx\.html$ /modx.html? [R=301,NE,L]
RewriteRule ^modx-faq\.html$ /modx-faqs.html? [R=301,NE,L]
RewriteRule ^modx-newbie-\.html$ /modx-newbie-faq.html? [R=301,NE,L]
RewriteRule ^snippet-tutorials\.html$ /package-tutorials.html? [R=301,NE,L]
RewriteRule ^spform-tutorial\.html$ /spform-snippet-tutorial.html? [R=301,NE,L]

In the above examples, the string on the left is rewritten to the string on the right. Notice that in the string on the left, any dots or slashes must be escaped with a preceding backslash. This is not necessary for the string on the right.

*Always* back up your .htaccess file before making any changes! If there is an error in the .htaccess file, all visitors to your site may get a Server 500 error.

When examining the log, you'll also see evildoers looking for a variety of security holes trying to access files and directories that never existed. Once you know what they're looking for, you can block them in .htaccess (see below).

Note that if you have an Adsense site, every page-not-found request will be followed by one or more visits from the GoogleBot looking for the same URL. You'll also see the GoogleBot failing to find the unpublished report pages when you access them.

You can prevent some of that with these lines in your robots.txt file. Adjust the file suffix as necessary based on the configuration of your site.

User-agent: Mediapartners-Google*
Disallow: /page-not-found-log-report.html

Requests for favicon.ico and apple-touch-icon are not logged. If you have those icons and want to see if those requests are successful, just comment out the section toward the end of the LogPageNotFound plugin that ignores them.

Installing LogPageNotFound

Go to System -> Package Management and click on the "Download Extras" button. Put "LogPageNotFound" in the search box and press "Enter". Click on the "Download" button next to LogPageNotFound on the right side of the grid. Wait until the "Download" has finished and click on the "Back To Package Manager" button.

The LogPageNotFound package should appear in the Package Manager grid. Click on the "Install" button and respond to the forms presented. That's it. Once the install finishes, logging will begin immediately. The LogPageNotFound plugin will be installed and active, and you'll get the snippet/resource pairs that will let you view the log created by the plugin.

The Log

As of Version 1.3.0, the log has been moved to the core/cache/logsdirectory. Updating LogPageNotFound will attempt to move it there for you. If that fails, you'll get an error message during the installation. If that happens, you can either move the file yourself, or copy its contents to the new log and delete the old one.

When a page is not found, an entry is written to the pagenotfound.log file. The log entry will include the path to the requested file, the time, the IP of the visitor, the Host of the visitor, The user agent, and the HTTP referer (if any).

Remember that there will be an entry for every page-not-found request, so there may be a lot of duplicates. The log is limited to 300 entries by default. You can set a different limit with the &log_max_lines property. New entries will be added at the top and old ones will scroll off the bottom. As of Version 1.0.3, there is a button to clear the log (thanks to Susan Ottwell).

If you have the Reflect Block plugin enabled, you won't see the various reflect requests in the Page Not Found log.

Properties

At present, the only settable property for the LogPageNotFound plugin is &log_max_lines, which sets the maximum number of entries in the log. The default value is 300.

The PageNotFoundLogReport snippet has no properties. It used to have two: &table_width (to set the width of the table) and &cell_width (to set the width of each cell). These have been replace by the inline CSS in the LogPageNotFoundTemplate.

Styling the Log Report

If you want to change the CSS for the log report, just duplicate the LogPageNotFound Template, edit it, and set it as the template for the LogPageNotFoundReport resource.

If you want to change the table headings for the report (say, to another language), duplicate the LogPageNotFound Resource, and edit the table headings in the duplicate page. Make sure the new page uses the correct template.

Reports

The report snippet and the resource to execute it are included in the package. The Page Not Found Log Report resource is unpublished and hidden from menus by default, but you can still view the log by previewing that resource from the Manager when you are logged in as a Super User.

If you would prefer to have the report as a widget on your dashboard, see the instructions here.

Caution

If you see serious repeat offenders in the log, you can block them by IP with code like this in your .htaccess file (using their actual IPs):

order allow,deny
deny from 127.0.0.1
deny from 127.0.0.2
deny from 127.0.0.3
allow from all

Blocking users in the .htaccess file is extremely fast, and it stops the users dead before they even reach the site. You can block people by User Agent (browser), but it's not all that practical, however, since that can be easily spoofed.

Banning by IP is more reliable, since they can't be spoofed directly. However, the visitor may be operating through a proxy (this will be noted in the report). Not all proxy users are bad actors. Be sure to note the user agent and host before adding an IP block. You don't want to block the GoogleBot or yourself. The User Agent can be helpful here, but many evildoers will fake the User Agent and pretend to be the GoogleBot or some other innocent-looking agent.

In many ways, it's safer to block users with redirect rules based on what they are looking for. Here are some examples:

RewriteCond %{REQUEST_URI} reflect [NC,OR]
RewriteCond %{QUERY_STRING} reflect [NC,OR]
RewriteCond %{REQUEST_URI} password_forgotten [NC,OR]
RewriteCond %{REQUEST_URI} mysql [NC,OR]
RewriteCond %{REQUEST_URI} sqlpatch [NC,OR]
RewriteCond %{REQUEST_URI} checkout [NC,OR]
RewriteCond %{REQUEST_URI} customer [NC,OR]
RewriteCond %{REQUEST_URI} admin [NC]
RewriteRule .* - [F,L]

Include the "OR" directive on every condition but the last, and make sure that the strings you specify are not contained in any alias on your site

As we said above, *always* back up your .htaccess file before making any changes!

 

My book, MODX: The Official Guide - Digital Edition is now available here. The paper version of the book may still be available from Amazon.

If you have the book and would like to download the code, you can find it here.

If you have the book and would like to see the updates and corrections page, you can find it here.

MODX: The Official Guide is 772 pages long and goes far beyond this web site in explaining beginning and advanced MODX techniques. It includes detailed information on:

  • Installing MODX
  • How MODX Works
  • Working with MODX resources and Elements
  • Using Git with MODX
  • Using common MODX add-on components like SPForm, Login, getResources, and FormIt
  • MODX security Permissions
  • Customizing the MODX Manager
  • Using Form Customization
  • Creating Transport Packages
  • MODX and xPDO object methods
  • MODX System Events
  • Using PHP with MODX

Go here for more information about the book.

Thank you for visiting BobsGuides.com

  —  Bob Ray