Add noopener to Link Tags IV

Improving the logic that updates the link tags


In the previous article, we looked at a way to use a snippet to automatically add rel="noopener noreferrer" to link code in MODX. The main problem with the previous method was that it assumed that either all or none of the links on the page were fixed. In this one, we'll modify the snippet to perform a more intelligent search.

MODX logo

The Problem

The problem with the snippet in the previous article (FixLinks) was that it did a full search on the whole page content. If it found _blank and didn't find noopener, it operated on all the links of the page. If it found noopener anywhere on the page, it did nothing, even though some of the links might not have been fixed.


The Solution

To correct the problem, we need to look at each individual link tag on the page, chunk, or template. That makes our snippet a little more complicated. Rather than make our nested loops deeper and more confusing, we'll create a separate function at the top of the script to process the content of each object. This is a good practice because it separates the logic of simply correcting the link from the rest of the code. That means you can make changes to the link correction code without worrying about breaking the other parts of the code, and vice versa. It also makes it easy to repurpose our snippet by changing either the looping logic in the main section, or what happens in the fixContent() function. We'll also add a has_blank() function, to streamline the code a little:

We'll also modify our fixContent() function so that it handles links with noopener but not noreferrer, and vice versa.

Here's the new code:

/* FixLinks snippet */

/** @var $modx modX */

function fixContent(&$string) {
    $retVal = false;

    /* pattern to find link with _blank */
    $pattern = '/<a [^>]+_blank[^>]*>/i';

    /* Pattern for adding noopener */
    $replacePattern = '/target\s*=\s*[\'"]_blank[\'"]/i';

    /* Get all tags with _blank on the page */
    if (preg_match_all($pattern, $string, $matches)) {

        /* If we got any, loop through them and replace if necessary */
        foreach ($matches[0] as $key => $value) {
            /* Don't process tags that already have noopener and noreferrer */
            if (strpos($value, 'noopener noreferrer') !== false) {
                continue;
            }

            if (strpos($value, 'noreferrer noopener') !== false) {
                continue;
            }

            /* Save the old tag value */
            $oldValue = $value;

            /* Handle case with noopener, but not noreferrer */
            if ( (strpos($value, 'noopener') !== false)
                    && strpos($value, 'noreferrer') === false) {
                $newValue = str_replace('noopener', 'noopener noreferrer', $value);

            /* handle cases with noreferrer, but not noopener */
            } elseif ( (strpos($value, 'noreferrer') !== false)
                    && strpos($value, 'noopener') === false) {
                $newValue = str_replace('noreferrer', 'noopener noreferrer', $value);

            /* Handle cases with neither noopener, no noreferrer */
            } else {

                /* Do the replacement within the current tag */
                $newValue = preg_replace($replacePattern,
                    'target="_blank" rel="noopener noreferrer"', $oldValue);
            }

            /* Do the replacement of the tag in the page */
            $string = str_replace($oldValue, $newValue, $string);

            /* Set return as true to tell calling function
               we did at least one replacement for this page */
            $retVal = true;
        }
    } else {
        $retVal = false;
    }

    return $retVal;
} // End of fixContent function

function has_blank($string) {
    return stripos($string, '_blank') !== false;
}

/* ********************************* */
/* Main script starts here */

/* Initialize some variables */
$objectTypes = array(
    'modResource' =>  'pagetitle',
    'modChunk' => 'name',
    'modTemplate' => 'name',
);

/* Are we running from the command line? */
$cliMode = php_sapi_name() == 'cli';

/* Define the newline based on cli mode */
$nl = $cliMode? "\n" : "<br>";
$output = '';

$count = 0;
$typeCounts = array();

/* Loop through the object types */
foreach ($objectTypes as $type => $nameAlias) {
    $typeCounts[$type] = 0;

    /* Get all objects of current type */
    $objs = $modx->getCollection($type);

    /* Loop through the objects of the current type */
    foreach ($objs as $obj) {
        /* @@var $obj modResource */

        /* Get the content */
        if ($obj->get('class_key') === 'Article') {
            $content = $obj->get('content');
        } else {
            $content = $obj->getContent();
        }

        /* Bail if content is empty or doesn't contain _blank */
        if (empty($content) ||  (!has_blank($content) {
            continue;
        }

        /* If we made it this far, we found a '_blank' so increment $count */
        $count++;

        if (fixContent($content)) {
            /* If we made it here, the page has been
               modified so increment the appropriate
               member of the $typeCounts array */
            $typeCounts[$type]++;

            /* Insert the new content into the object */
            $obj->setContent($content);

            /* Save the object and report on the object */
            if ($obj->save()) {
              if ($content !== $newContent) {
                  $output .= $nl . 'Modified ' . $type . '  ' .
                      $obj->get($nameAlias);
              } else {
                  $output .=  $nl . 'Failed to modify ' . $type .
                      '  ' . $obj->get($nameAlias);
              }
            }
        }
    }
}

/* Finish up with the summary */
$output .=  $nl . "Finished -- found " . $count .
    " total objects with _blank" . $nl;
$output .=  "Changed: " . $nl .
    print_r($typeCounts, true);

/* Display all output */
if ($cliMode) {
    echo $output;
} else {
    return $output;
}

We'll come back to the function at the top, but let's look first at the main section of the code. As we did in the previous version we set up the array of object types (resource, chunk, and template). The key and value pairs in the array contain the class key of the object as the key, and the field that contains the name of the object as the value.

As we did before, we check to see if the script is running from the command line or in an editor and define a newline character ($nl) based on the result of that test.

We also initialize the $count variable which counts the total number of object where the fix is implemented, and the $output variable which holds the output to be displayed when the code is finished. We also initialize the $typeCounts variable which holds the count of fixed objects of each type.

Next, we have our loops, which look pretty much as they did before except that some of the code of the inner loop has been moved to the function. The outer foreach() loop cycles through the object types (modResource, modChunk, and modTemplate). The inner foreach() loop iterates over each object of the type currently being processed.

At the beginning of the outer loop, we set the count of fixed objects of that type to 0. Near the bottom of the inner loop, we increment that count every time we fix an object.

In the inner loop, we first get the content of the object. The if statement below /* Bails if content is empty or doesn't contain _blank */ isn't strictly necessary, because the function would ignore tags without _blank. This is here to improve the performance. There's no point in calling a function and running preg_match_all(), which is somewhat slow, if there's nothing to process. This test could also have gone at the top of the function, but why make the function call if there's nothing for the function to do.


The Function Call

Notice that in the declaration of the fixContent() function at the top we put an ampersand ahead of the $string variable. This is called "passing by reference." It tells PHP that we want $string treated as a reference rather than a value. Without the ampersand, changes made to the variable within the function would not persist. No matter what we did to the $string variable inside the function, the original variable passed to the function would be unchanged.

Notice too, that the variable names don't need to match. In the call to the function, we send the $content variable. In the function, we call it $string. Because it's a reference variable, any changes we make to $string inside the function are actually being made to the $content variable.

We could have done the same thing without passing by reference (the alternative is called "passing by value"). In that case the code would look like this:

/* In the function */
function fixContent($string) {
   /* ... */
   return $string;
}

/* In the main code */
$content = fixContent($content);

In the abbreviated code above, the function call passes a *copy* of the $content variable to the function. The function modifies that copy, and returns the changed version in its return statement. We had to add $content = ahead of the function call to modify the $content variable so it contains the changes. If we did that, however, the function has to return the changed copy. Since the function can only have one return value, we couldn't return true or false as we actually did in the function to tell the main script whether any changes were made.

If you do a web search for PHP pass by reference versus value, you'll see some heated arguments about whether passing by reference should ever be used. One bit of common wisdom you'll see is the claim that passing by reference is slower than passing by value.

I noticed that the benchmarks claiming to "prove" this claim didn't actually modify the passed value in their functions. Out of curiosity, I benchmarked a case similar to our use case. I found that when a 1,000-character string was modified in the function, passing by reference was actually faster in both PHP 5 and PHP 7.

The speed difference could hardly have been more trivial. After doing 10 million function calls using the two methods, the difference was only a fraction of a millisecond for each function call. You should never choose one method over the other on the basis of performance. In our case, though, the ability to return true or false from the function with no loss in performance led me to choose passing by reference.


Inside the Function

This is where the big improvement over our previous method lives. In the function, we use preg_match_all() to match every link tag that contains _blank. After that call, the $matches[0] array contains as many members as there are links containing _blank.

Here's the pattern we use for that search:

$pattern = '/<a [^>]+_blank[^>]*>/i';

The slashes at each ends are delimiters that tell the regex engine where the pattern begins and ends. The lower-case i at the end says we want a case-insensitive match.

When processing the pattern, the regex engine looks for < followed by a, followed by one or more characters that are not > followed by _blank followed by zero or more characters that are not > followed by >. Whenever a match is found, it goes in the $matches array.

If preg_match_all() finds any matches, we loop through the matches[0] array. For each matching tag, we look for noopener noreferrer. We also look for them in reverse order. If we find either pair, we ignore that link. If we don't, we go on to doing the replacements. If we find noopener but not noreferrer, or vice versa, we simple do a str_replace() on the single term to replace it with the pair of terms. If neither of those sections execute, we do our replacement using the same pattern we used in the previous article.


Wrapping Up

The new code should fix all links that are in resources, chunks (including Tpl chunks), or templates. It's much more reliable than the previous version. It still won't fix links created in code or links in template variables. Links created in code are almost impossible to fix, because they often contain variables, but they're fairly rare. In the following article, we'll handle links in TVs.


Coming Up

In the following article, we'll look at how to correct links found in template variables and learn a little about how TV input and output option values are stored.




Looking for high-quality, MODX-friendly hosting? As of May 2016, Bob's Guides is hosted at A2 hosting. (More information in the box below.)



Comments (0)


Please login to comment.

  (Login)