10/6/09

Iterating with SPL 101: Directory Iterators

by Rafael Dohms

In the past I blogged about SPL (the Standard PHP Library) and how it makes PHP Developers’ life’s easier. Since then i have noticed a lack of SPL recipes on the web. If you are getting to know SPL, the use of the available classes can be a real mystery. So I decided to add more posts on SPL for Google to add to its index. This is the first in a series of posts.

Wouldn’t it be nice if you could live life just by applying a foreach to each year and live day by day? Ok, that was an awful joke, but using iterators does make life a lot easier and it makes for cleaner code. SPL’s iterator classes are really awesome and helpful – they replace multiple lines of code and a handful functions with a simple new. This and a foreach can really help cleaning up code. Admittedly there is an argument that this might make the code less legible to “beginner”programmers or programmers that are not familiar with iterators and such, but hey, if you can’t understand it, read this post and learn it.

In this article i want to go over some of SPL’s Directory Iteration options, following up with more details on the code i posted in the original SPL article. Let’s dive into the infinity of iterators and iterate over them, showing how they “go together”and where to get them to solve things for you.

Native in SPL

Native SPL classes have been converted to C, so they perform much faster and are available in any PHP installation, especially since in PHP 5.3 you cannot disable SPL anymore.

DirectoryIterator (doc)

This is a simple iterator because it is not recursive (so you don’t end up as dizzy as we did after the “Iteratah drinking game” at Tek’09). It basically replaces the functionality of the scandir function, but gives you a few more advantages along the way. You can pass it the directory you wish to iterate and it will return an object that you can foreach over as if it were an array. This is a simple task that can be done using scandir as well, so let’s compare advantages, starting with some code:

<php

echo '- Iterate diretory using scandir' . PHP_EOL;
echo '- Avoid DOT directories' . PHP_EOL;
echo '- Show full path' . PHP_EOL;
$dir = 'samples' . DIRECTORY_SEPARATOR . 'sampledirtree';
$files = scandir( $dir );
foreach($files as $file){
    if ($file != '.' || $file != '..'){
        echo $dir . DIRECTORY_SEPARATOR . $file . PHP_EOL;
    }
}
?>

And same thing with DirectoryIterator

<php

echo '- Iterate directory using DirectoryIterator' . PHP_EOL;
echo '- Avoid DOT directories' . PHP_EOL;
echo '- Show full path' . PHP_EOL;
$files = new DirectoryIterator('samples' . DIRECTORY_SEPARATOR . 'sampledirtree');
foreach($files as $file){
    if (!$file->isDot()){
        echo $file->getRealPath() . PHP_EOL;
    }
}

?>

Output for both:

- Iterate directory using (scandir|DirectoryIterator)
- Avoid DOT directories
- Show full path
samples/sampledirtree/file1.txt
samples/sampledirtree/folder1
samples/sampledirtree/folder2

The code looks pretty much the same and we are basically performing a simple task, but one of the powerful built-in things about the DirectoryIterator is that instead of returning a string as scandir does, it returns a SplFileInfo Object, packed with a lot of information goodness. It allows us to skip the “dot” files ( . and .. ) without testing for both, and it allows us to get a file’s full real path without having to concatenate the actual directory. It does even more, check out the main methods list: (whole list)

  • getFilename ()
  • getOwner ()
  • getPath ()
  • getPathname ()
  • getPerms ()
  • getRealPath ()
  • getSize ()
  • getType ()
  • isDir ()
  • isExecutable ()
  • isFile ()
  • isLink ()
  • isReadable ()
  • isWritable ()
  • openFile ($mode= ‘r’, $use_include_path=false, $context=NULL)

Arguably one can also get this information by calling a function, but hey – this is OO. It is cleaner and not procedural. So it makes for much cleaner code and ease of use because you have a fully qualified object to handle a file right there, just a method call away. Its important to note that this does come at a performance cost, but at less then 40% and measured in much less then microseconds, this is not a major thing to worry about.

RecursiveDirectoryIterator (doc)

This is where the fun begins, recursive goodness. You probably noticed above that the script did not follow up on the folders it found, it stayed within the first level of the directory we chose. Recursion solves this problem. Basically this iterator will go into directories, executing DirectoryIterator on anything that is a directory. This is done by implementing the getChildren function which allows you to get a DirectoryIterator instance of the child directory.

Using the regular scandir approach we would have had to use a recursive function to obtain this behavior, but using this we only need to.. “wait, even with the getChildren function we still would need a recursive function to go through it, hey! someone lied to me!” .. This is where SPL composite magic comes in, we simply need to use a RecursiveIteratorIterator (see how the drinking game begins to be fun?).

The RecursiveIteratorIterator is basically an object that implements the recursive function, but without the complications. Just pass a Recursive<whatever>Iterator to its construct and foreach away. The iterator will automatically call the getChildren functions and manage that, and you can even tell it how to behave.

<php

function recursiveScanDir($dir){
    $files = scandir($dir);
    foreach($files as $file){
        if ($file != '.' && $file != '..'){
            if (is_dir($dir . DIRECTORY_SEPARATOR . $file)){
                recursiveScanDir($dir . DIRECTORY_SEPARATOR . $file);
            }else{
                echo $dir . DIRECTORY_SEPARATOR . $file . PHP_EOL;
            }
        }
    }
}

$dir = 'samples' . DIRECTORY_SEPARATOR . 'sampledirtree';
recursiveScanDir($dir);

?>

Now using SPL stuff with 3.5 less lines of code:

<php

$files = new RecursiveIteratorIterator( new RecursiveDirectoryIterator('samples' . DIRECTORY_SEPARATOR . 'sampledirtree') );
foreach($files as $file){
    echo $file->getPathname() . PHP_EOL;
}

?>

Output:

samples/sampledirtree/file1.txt
samples/sampledirtree/folder1/file1.txt
samples/sampledirtree/folder1/file2.html
samples/sampledirtree/folder2/file1.html
samples/sampledirtree/folder2/file2.txt

We used default settings here, but by manipulating the $mode property of the contract (2nd parameter), we can for example show children first, or “leaves” only. If you are not seeing it yet, imagine that you want to remove a directory structure – you can’t just use rmdir because it will fail due to files existing inside the folder. So you will need to delete all the files one by one, following the folder hierarchy. If you use this iterator combination and ask it to show children first, you can then delete all children and afterward remove the parents, like in this code:

<php
//Recursively delete tree structure
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator('samples' . DIRECTORY_SEPARATOR . 'sampledirtree'), RecursiveIteratorIterator::CHILD_FIRST);
foreach($files as $file){
    if ($file->isDir()){
        rmdir($file->getRealPath());
    }else{
        unlink($file->getRealPath());
    }
}
?>

You might not see the advantage of SPL over scandir in the basic stuff, but once you start adding operations to your iteration and start to need specific behavior, you begin to realize it let’s you have much simpler and easily readable code, plus its OO! (i’m a big OO fan BTW)

Non-native in SPL

Non-native SPL clases are available currently as examples and some will be converted to C and integrated in the native part of SPL. Some contain useful as examples and you can implement them locally for your own use, or you can load these examples into your code by one of two methods:

  • Add ext/spl/examples/autoload.inc to you php.ini in auto_prepend_file (or add it to the file already set in auto_prepend_file)
  • Include ext/spl/examples/autoload.inc in your application

The autoload.inc file is available in the folder above, which should be in your PHP install or in the source code you can download from PHP.net. I would recommend downloading this and adding it into your application tree if you wish to use it.

Personal Recommendation: Use everything in the examples folder as inspiration to what you can do with SPL and implement it locally

DirectoryTreeIterator (doc)

The DirectoryTreeIterator is more interesting as an example of what you can do with the iterators as to actually be something you might use on a daily basis. It does what the RecursiveDirectoryIterator does but diplays the result as a ASCII directory tree, so using this code:

<php
set_include_path( get_include_path() . PATH_SEPARATOR . 'spl' . DIRECTORY_SEPARATOR . 'examples' );
include('spl' . DIRECTORY_SEPARATOR . 'examples' . DIRECTORY_SEPARATOR . 'autoload.inc');

$files = new DirectoryTreeIterator('samples' . DIRECTORY_SEPARATOR . 'sampledirtree');

foreach($files as $file){
    echo $file . PHP_EOL;
}

?>

We get this result:

|-samples/sampledirtree/file1.txt
|-samples/sampledirtree/folder1
| |-samples/sampledirtree/folder1/file1.txt
| \-samples/sampledirtree/folder1/file2.html
\-samples/sampledirtree/folder2
  |-samples/sampledirtree/folder2/file1.html
  \-samples/sampledirtree/folder2/file2.txt

Since i said its more interesting as an example, let’s look at the actual source code of the class that does the printing:

	function current()
{
$tree = '';
for ($l=0; $l < $this->getDepth(); $l++) {
$tree .= $this->getSubIterator($l)->hasNext() ? '| ' : '  ';
}
return $tree . ($this->getSubIterator($l)->hasNext() ? '|-' : '\-')
. $this->getSubIterator($l)->__toString();
}

As you can see, it is just a matter of working the ASCII to images and css and you can very easily have a directory tree anywhere on your site, by taking advantage of the RecursiveDirectoryIterator.

End of Part I…

This is a brief overview of what you can do with all the Directory Iterators available in SPL. Combining these directory iterators with other navigation iterators lets you do a lot more. This will be the topic of another post in which I will talk about all the different iterators you can use to iterate over iterators (say that 3x fast!) all the way from the FilterIterator to the InfinityIterator. I hope this helps you to get an idea of how to make your code better with SPL code.

Related posts:

  1. SPL: The hidden gem
  2. New features of PHP 5.3
  3. Media encoding with ffmpeg
  4. PHPT: Writing tests for PHP
  5. Security: Are you thinking about it?

Leave a Reply