Scraping, Parsing, and API Use VII

Using James Brumond's Git class to get data from a local Git repository


In this series of articles, we're looking at a variety of techniques that involve scraping, parsing, and using an API to get information. The use case is a script to assemble a documentation page for the MODX documentation site by gathering information from several different sources and saving the finished HTML code for the page to a file.

In the previous article, we looked at the getDataFromGitHub() function used to grab statistics from the GitHub API. In this one, we'll look at the the implementation of another function from our skeleton code: getCommitCount() using a great PHP class called Git, written by James Brumond.

MODX logo

The Problem

As we saw in the previous article, the commit count reported by GitHub is unreliable. I was surprised at how small the commit counts were, especially for my older extras that have had continuous commits for a number of years. After a little digging, I found that GitHub's report only counts commits from the current year. There is a methods for aggregating and counting all commits, but it's fairly complex, and IIRC, you have to know the actual years involved. It was much easier and faster to get the commit count from my local github repo, though it took some work to figure out how to do it.



The Solution

Getting data from a local Git repo involves a very handy PHP class called Git written by James Brumond. After downloading the Git.php file to a local directory, we load that class with this line of the script:

include 'C:\xampp\htdocs\addons\assets\mycomponents\_extras-rtfm\Git.php';

Because the functions of the Git class are public static functions, we don't need to instantiate the class. We just call them directly with Git::functionName().

Here's the code of our getCommitCount() function. See if you can tell how it works.

/**
 * Get number of commits for project from local GitHub repo
 * using James Brumond's Git.php
 *
 * @param $localPath string - path to local extras directory
 * @param $file string - name of specific extra directory
 * @param $phs array - placeholder array (as reference)
 *
 * @return int - total number of commits for project
 */
function getCommitCount($localPath, $file, &$phs) {

        $path = $localPath . "\\" . $file;
        chdir($path);

        $repo = Git::open($path);  // -or- Git::create('/path/to/repo')
        Git::windows_mode();
        $result = $repo->run('log --pretty=short --oneline');
        // echo "\n\n" . $result;
        $result = count(explode("\n", $result));

        $phs['commits'] = $result;

        return $result;
}


Code Explained

You probably figured this out, but in case not, here's how the code works. First, we set the full path to the local repo for one extra in the $path variable, then we use chdir($path) to go to that directory. We open the repo with Git::open($path), then we call Git::windows_mode() because I'm on a Windows box.

Next comes the clever part. The $repo variable is an instance of the Git class that knows all about the repo we've opened. That means we can use its run() method to issue a Git command to that repo as if we were on the command line, and capture its output in a variable (e.g., $result = $repo->run('log --pretty=short --oneline');. That command will output a brief list of all commits to the repo separated by newlines. Once we have that in the $result variable, we use explode(), using the newline as a delimiter to turn it into a PHP array with each element of the array being a specific commit. Then we simply count() how many elements there are in the array and set our $phs['commits'] placeholder to the resulting value.

As we've done before, the placeholder array is passed by reference (&$phs) so the change we make to it in this function will persist outside the function. As a side note, passing an array by reference does *not* speed things up in PHP. It should only be used if the array will be modified in the function and you want the changes to persist outside the function. The $modx class is usually passed by reference in the MODX core files because snippets and plugins may modify the class. The $scriptProperties array is usually passed by reference as well for the same reason.


Coming Up

I wanted to include a count of all the files used for each extra, partly for my own curiosity, and partly because most people have no idea how many separate files have to be written for a single extra. In the following article we'll look at how to use the DirWalker class to get the total number of files in a directory and all its subdirectories.


Looking for high-quality, MODX-friendly hosting? As of May 2016, Bob's Guides is hosted at A2 hosting. (More information in the box below.)



Comments (0)


Please login to comment.

  (Login)