Apr 192008
 

If you maintain a website at a shared-hosting provider, you may not have control over how your Apache log files are archived.  These archives may also be overwritten, so if you’re interesting in keeping statistics for later analysis, you’ll have to archive your log files yourself.  Here I provide a simple PHP script you can run to save your own log files.

Some hosting providers will archive your log files by copying them to a “logs” directory.  Under that directory are subdirectories for each of your domains, like www.example.com and www.sample.com. Inside these directories are your archived log files.  But, depending on how your provider does it, these files may eventually be overwritten.  For example, a file naming scheme is:

www.example.com.1
www.example.com.2.gz
www.example.com.3.gz

www.example.com.7.gz
agent_log.1
agent_log.2.gz
agent_log.3.gz

agent_log.7.gz
error_log.1
error_log.2.gz
error_log.3.gz

error_log.7.gz
referrer_log.1
referrer_log.2.gz
referrer_log.3.gz

referrer_log.7.gz

About each day, your hosting provider rotates and renames these files so that by week’s end, the earliest files are lost forever.  To prevent this, run the following PHP script daily.  Change the $pathroot variable to the document root path used by your own hosting provider.

<?php
$pathroot = "/your/document/root/path/";
$pathsrc = "logs/";
$pathdest = "logs_archive/";
$filesrc = "*.gz";
$sites = array("www.example.com",
               "www.sample.com");
function CopyLogFiles($site)
{
  global $pathroot, $pathsrc, $pathdest;
  $path = $pathroot . $pathsrc . $site;
  $directory = dir($path);
  $entries = array();
  while (false !== ($entry = $directory->read()))
     if (!is_dir($entry))
        $entries[] = $entry;
  foreach($entries AS $entry)
  {
     if (strpos($entry, ".gz") !== false)
     {
        $source  = $directory->path . "/";
        $source .= $entry;
        $modified = date("Ymd-His",
                         filemtime($source));
        $entry = preg_replace("/.\d./", ".",
                              $entry);
        $dest = $pathroot . $pathdest . $site;
        $dest .= "/" . $modified . "-";
        $dest .= $entry;
        copy($source, $dest);
     }
  }
  $directory->close();
}
foreach($sites AS $site)
  CopyLogFiles($site);
?> 

This script copies archived log files from your “log” directory to your own “logs_archive” directory.  It names each log file with the date and time it was last modified and removes the .1, .2, .3 numbering scheme.  Since the original .1 file may be renamed to .2 on the following day, the script will overwrite the same file in your logs_archive directory.  Since files are copied and renamed using the date and time they where last modified, this overwrite is harmless.

Now your logs_archive directory will accumulate your log files.  Instead of www.example.com.2.gz however, the files will be named like 20080419-152034-www.example.com.gz.  In this example the file was last modified on April 19, 2008 at 15:20:34.  The next day when your hosting provider renames www.example.com.2.gz to www.example.com.3.gz, the script will copy www.example.com.3.gz to 20080419-152034-www.example.com.gz again because it has the same modification date and time.

To run the script daily, set up a crontab job. For instance you could run it every day at 20 minutes past midnight: 20, 0, *, *, *, /usr/bin/php savelogs.php

 Posted by at 1:21 pm

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

*

Optionally add an image (JPEG only)