LogPageNotFound Plugin Tutorial

I could tell you how many hours it takes to develop a MODX extra Transport Package complete with a build script, properties, multiple MODX elements, internationalized strings, error checks, and then fully test it, but you wouldn't believe me. If you use this extra and like it, please consider donating. The suggested donation for this extra is $5.00, but any amount you want to give is fine (really). For a one-time donation of $50.00 you can use all of my non-commercial extras with a clear conscience.


PayPal

Overview

LogPageNotFound logs all requests that result in a 404 (page not found) error. You will almost certainly find that there are links out there for pages you've changed the alias of and possibly links at your own site that no longer work. Using the information in the log, you can create either weblinks on the site or redirect rules in your .htaccess file to send visitors to the right place.

Here are some examples of rewrite rules for redirecting MODX site page requests in .htaccess

RewriteCond %{QUERY_STRING}  ^$
RewriteRule ^MODX-revolution\.html$ /modx-revolution.html? [R=301,NE,L]
RewriteRule ^Modx\.html$ /modx.html? [R=301,NE,L]
RewriteRule ^MODx\.html$ /modx.html? [R=301,NE,L]
RewriteRule ^modx-faq\.html$ /modx-faqs.html? [R=301,NE,L]
RewriteRule ^modx-newbie-\.html$ /modx-newbie-faq.html? [R=301,NE,L]
RewriteRule ^snippet-tutorials\.html$ /package-tutorials.html? [R=301,NE,L]
RewriteRule ^spform-tutorial\.html$ /spform-snippet-tutorial.html? [R=301,NE,L]

In the above examples, the string on the left is rewritten to the string on the right. Notice that in the string on the left, any dots or slashes must be escaped with a preceding backslash. This is not necessary for the string on the right.

*Always* back up your .htaccess file before making any changes! If there is an error in the .htaccess file, all visitors to your site may get a Server 500 error.

When examining the log, you'll also see evildoers looking for a variety of security holes trying to access files and directories that never existed. Once you know what they're looking for, you can block them in .htaccess (see below).

Note that if you have an Adsense site, every page-not-found request will be followed by one or more visits from the GoogleBot looking for the same URL. You'll also see the GoogleBot failing to find the unpublished report pages when you access them.

You can prevent some of that with these lines in your robots.txt file. Adjust the file suffix as necessary based on the configuration of your site.

User-agent: Mediapartners-Google*
Disallow: page-not-found-log-report.html

Installing LogPageNotFound

Go to System | Package Management and click on the "Download Extras" button. Put "LogPageNotFound" in the search box and press "Enter". Click on the "Download" button next to LogPageNotFound on the right side of the grid. Wait until the "Download" button changes to "Downloaded" and click on the "Finish" button.

The LogPageNotFound package should appear in the Package Manager grid. Clock on the "Install" button and respond to the forms presented. That's it. Once the install finishes, logging will begin immediately. The LogPageNotFound plugin will be installed and active, and you'll get the snippet/resource pairs that will let you view the log created by the plugin.

The Log

When a page is not found, an entry is written to the pagenotfound.log file (inside the logs/ directory below the MODX core directory). The log entry will include the path to the requested file, the time, the IP of the visitor, the Host of the visitor, The user name and the HTTP referer (if any).

Remember that there will be an entry for every page-not-found request, so there may be a lot of duplicates. The log is limited to 300 entries by default. You can set a different limit with the &log_max_lines property. New entries will be added at the top and old ones will scroll off the bottom. As of Version 1.0.3, there is a button to clear the log (thanks to Susan Ottwell).

If you have the Reflect Block plugin enabled, you won't see the various reflect requests in the Page Not Found log. If it's disabled or not installed, you'll see plenty of them if your site contains the word "MODX".

Properties

At present, the only settable property for the LogPageNotFound plugin is &log_max_lines, which sets the maximum number of entries in the log. The default value is 300.

The PageNotFoundLogReport snippet has two properties: &table_width (to set the width of the table) and &cell_width (to set the width of each cell). Feel free to adjust them to meet your needs. Set them in the snippet tag on the Page Not Found Log Report page.

Reports

The report snippet and the resource to execute it are included in the package. The Page Not Found Log Report resource is unpublished and hidden from menus by default, but you can still view the log by previewing that resource from the Manager when you are logged in as a Super User.

Caution

If you see serious repeat offenders in the log, you can block them by IP with code like this in your .htaccess file (using their actual IPs):

order allow,deny
deny from 127.0.0.1
deny from 127.0.0.2
deny from 127.0.0.3
allow from all

Blocking users in the .htaccess file is extremely fast and it stops the users dead before they even reach the site. It's not all that practical, however, since people can spoof IPs, there are zillions of evildoers out there, and the request can be from an IP that might later be assigned to a legitimate user. Be sure to note the host before adding an IP block. You don't want to block the GoogleBot or yourself. The User Agent can be helpful here, but many evildoers will fake the User Agent and pretend to be the GoogleBot.

In many ways, it's safer to block users with redirect rules based on what they are looking for. Here are some examples:

RewriteCond %{REQUEST_URI} reflect [NC,OR]
RewriteCond %{QUERY_STRING} reflect [NC,OR]
RewriteCond %{REQUEST_URI} password_forgotten [NC,OR]
RewriteCond %{REQUEST_URI} mysql [NC,OR]
RewriteCond %{REQUEST_URI} sqlpatch [NC,OR]
RewriteCond %{REQUEST_URI} checkout [NC,OR]
RewriteCond %{REQUEST_URI} customer [NC,OR]
RewriteCond %{REQUEST_URI} admin [NC]
RewriteRule .* - [F,L]

Include the "OR" directive on every condition but the last, and make sure that the strings you specify are not contained in any alias on your site

As we said above, *always* back up your .htaccess file before making any changes!

 

My book, MODX: The Official Guide - Digital Edition is now available here. The paper version of the book is available from Amazon.

If you have the book and would like to download the code, you can find it here.

If you have the book and would like to see the updates and corrections page, you can find it here.

MODX: The Official Guide is 772 pages long and goes far beyond this web site in explaining beginning and advanced MODX techniques. It includes detailed information on:

  • Installing MODX
  • How MODX Works
  • Working with MODX resources and Elements
  • Using Git with MODX
  • Using common MODX add-on components like SPForm, Login, getResources, and FormIt
  • MODX security Permissions
  • Customizing the MODX Manager
  • Using Form Customization
  • Creating Transport Packages
  • MODX and xPDO object methods
  • MODX System Events
  • Using PHP with MODX

Go here for more information about the book.

Thank you for visiting BobsGuides.com

  —  Bob Ray