October 8th 2009

Iterating over life with SPL Iterators I: Directories

In the past i have already talked about SPL and how it makes PHP Developers’ life a breeze, since then i have felt a lack of SPL recipes around the web, if you are getting into SPL now, using some of the available classes can be a real mystery, so I decided to add more posts to google’s list of SPL articles. So this is the first in a line that i will be adding as i come by the examples.

Wouldn’t it be nice if you could go by life just applying a foreach to each year and life day by day? Ok, that was an awful joke, but using iterators does make life a lot easier and fun, and that’s without mentioning cleaner code. SPL’s iterator classes are really awesome and helpful, replacing multiple lines of code and a handful functions with a simple new this and a foreach can really help cleaning up code. Ok, i did get into an argument that this might make the code less legible to “beginner”programmers or programmers that are not familiar with iterators and such, but hey, if you can’t understand it, read this post and learn it.

In this article i want to go over some of SPL’s Directory Iteration options, following up with more details the code i posted in the original SPL article. So i will now dive into the infinity of iterators and iterate (sic) over them, showing how they “go together”and where to get them to solve things for you.

Native in SPL

Native SPL classes have been converted to C, so they perform much faster and are available in any PHP install, especially since in PHP 5.3 you cannot disable SPL anymore.

DirectoryIterator (doc)(doxygen)

This is a simple iterator, as in its not a recursive iterator but leave that for later so you don’t end up as dizzy as we endedup after the “Iteratah drinking game” in Tek’09. It basically replaces what you can do with the scandir function, but gives you a few more advantages on the way out. Basically you can pass it the directory you wish to iterate and it will return an object that you can foreach over as if it were an array. This is a simple task that can be done using scandir as well, so let’s compare advantages, first some code:

<?php

echo '- Iterate diretory using scandir' . PHP_EOL;
echo '- Avoid DOT directories' . PHP_EOL;
echo '- Show full path' . PHP_EOL;
$dir = 'samples' . DIRECTORY_SEPARATOR . 'sampledirtree';
$files = scandir( $dir );
foreach($files as $file){
    if ($file != '.' || $file != '..'){
        echo $dir . DIRECTORY_SEPARATOR . $file . PHP_EOL;
    }
}
?>

And same thing with DirectoryIterator

<?php

echo '- Iterate directory using DirectoryIterator' . PHP_EOL;
echo '- Avoid DOT directories' . PHP_EOL;
echo '- Show full path' . PHP_EOL;
$files = new DirectoryIterator('samples' . DIRECTORY_SEPARATOR . 'sampledirtree');
foreach($files as $file){
    if (!$file->isDot()){
        echo $file->getRealPath() . PHP_EOL;
    }
}

?>

Output for both:

- Iterate directory using (scandir|DirectoryIterator)
- Avoid DOT directories
- Show full path
samples/sampledirtree/file1.txt
samples/sampledirtree/folder1
samples/sampledirtree/folder2

The code looks pretty much the same and we are basically performing a simple task, but one of the powerful built-in things about the DirectoryIterator is that instead of a plain string as scandir does, it returns a SplFileInfo Object, packed with a whole bunch of information goodness, thus it allows us to skip the “dot” files ( . and .. ) without testing for both and getting a file’s full real path without having to concatenate the actual directory and such, but it actually does more, check out the main methods list: (whole list)

  • getFilename ()
  • getOwner ()
  • getPath ()
  • getPathname ()
  • getPerms ()
  • getRealPath ()
  • getSize ()
  • getType ()
  • isDir ()
  • isExecutable ()
  • isFile ()
  • isLink ()
  • isReadable ()
  • isWritable ()
  • openFile ($mode= ‘r’, $use_include_path=false, $context=NULL)

Its arguable that these are all information you can get by calling a function, hey, this is OO, its cleaner and not procedural. So it makes for much cleaner code ad ease of use, you have a fully qualified object to handle a file right there, just a method call away. Its important to notice that this does come at a performance cost, but at less then 40% and measured in much less then microseconds, this is not a major thing to worry about.

RecursiveDirectoryIterator (doc)(doxygen)

This is where the fun begins, recursive goodness. So you noticed above that the script did not follow up on the folders it found, it stayed within the first level of the directory we chose, this is where recursiveness comes in. Basically this iterator will go into directories, executing DirectoryIterator on anything that is a directory. This is done by implementing the getChildren function which allows you to get a DirectoryIterator instance of the child directory.

Using regular scandir approach we would have to use a recursive function to obtain this behavior, but using this we only need to.. “wait, even with the getChildren function we still would need a recursive function to go through it, hey! someone lied to me!” .. This is where SPL composite magic comes in, we just need to use a RecursiveIteratorIterator (see how the drinking game begins to be fun?).

The RecursiveIteratorIterator is basically an object that implements the recursive function, but without the hassle and thinking needed, just pass a Recursive<whatever>Iterator to its construct and foreach away, it will automatically call the getChildren functions and manage that, and you can even tell it how to behave.

<?php

function recursiveScanDir($dir){
    $files = scandir($dir);
    foreach($files as $file){
        if ($file != '.' && $file != '..'){
            if (is_dir($dir . DIRECTORY_SEPARATOR . $file)){
                recursiveScanDir($dir . DIRECTORY_SEPARATOR . $file);
            }else{
                echo $dir . DIRECTORY_SEPARATOR . $file . PHP_EOL;
            }
        }
    }
}

$dir = 'samples' . DIRECTORY_SEPARATOR . 'sampledirtree';
recursiveScanDir($dir);

?>

Now using SPL stuff with 3.5 less lines of code:

<?php

$files = new RecursiveIteratorIterator( new RecursiveDirectoryIterator('samples' . DIRECTORY_SEPARATOR . 'sampledirtree') );
foreach($files as $file){
    echo $file->getPathname() . PHP_EOL;
}

?>

Output:

samples/sampledirtree/file1.txt
samples/sampledirtree/folder1/file1.txt
samples/sampledirtree/folder1/file2.html
samples/sampledirtree/folder2/file1.html
samples/sampledirtree/folder2/file2.txt

We used default settings here, but in case we manipulate the $mode property of the contract (2nd parameter), we can order it to for example, show children first, or “leaves” only, this is very useful. If you are not seeing it yet, imagine you want to remove a directory structure, you can’t just rmdir it cause it will fail due to files existing inside the folder, so you need to delete one by one following hierarchy. So if you use this iterator combination and ask it to show children first, you can then delete all children and afterward remove the parents, like in this code:

<?php
//Recursively delete tree structure
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator('samples' . DIRECTORY_SEPARATOR . 'sampledirtree'), RecursiveIteratorIterator::CHILD_FIRST);
foreach($files as $file){
    if ($file->isDir()){
        rmdir($file->getRealPath());
    }else{
        unlink($file->getRealPath());
    }
}
?>

Obviously you might not see advantages between the SPL stuff and scandir in the basic stuff, but once you start adding operations to your iteration and begin needing specific behavior, like the delete example, you begin to realize it let’s you have much simpler and easily readable code, plus its OO! (i’m a big OO fan BTW)

Non-native in SPL

Non-native SPL clases are available currently as examples and some will be converted to C and integrated in the native part of SPL. Some are useful as examples and you can then implement them locally for your use, or you can load these examples into your code by one of two choices:

  • Add ext/spl/examples/autoload.inc to you php.ini in auto_prepend_file (or add it to the file already set in auto_prepend_file)
  • Include ext/spl/examples/autoload.inc in your application

The autoload.inc file is available in the folder above which should be in your PHP install or in the source code you can download from PHP.net. I would recommend downloading this and adding it into your application tree if you wish to use it.

Personal Recommendation: Use everything in the examples folder as inspiration to what you can do with SPL and implement it locally

DirectoryTreeIterator (doxygen)

The DirectoryTreeIterator is more interesting as an example of what you can do with the iterators as to actually be something you might use on a daily basis. It basically does what the RecursiveDirectoryIterator does but diplays the result as a ASCII directory tree, so using this code:

<?php
set_include_path( get_include_path() . PATH_SEPARATOR . 'spl' . DIRECTORY_SEPARATOR . 'examples' );
include('spl' . DIRECTORY_SEPARATOR . 'examples' . DIRECTORY_SEPARATOR . 'autoload.inc');

$files = new DirectoryTreeIterator('samples' . DIRECTORY_SEPARATOR . 'sampledirtree');

foreach($files as $file){
    echo $file . PHP_EOL;
}

?>

We get this result:

|-samples/sampledirtree/file1.txt
|-samples/sampledirtree/folder1
| |-samples/sampledirtree/folder1/file1.txt
| \-samples/sampledirtree/folder1/file2.html
\-samples/sampledirtree/folder2
  |-samples/sampledirtree/folder2/file1.html
  \-samples/sampledirtree/folder2/file2.txt

Since i said its more interesting as an example, let’s look at the actual source code of the class that does the printing:

	function current()
{
$tree = '';
for ($l=0; $l < $this->getDepth(); $l++) {
$tree .= $this->getSubIterator($l)->hasNext() ? '| ' : '  ';
}
return $tree . ($this->getSubIterator($l)->hasNext() ? '|-' : '\-')
. $this->getSubIterator($l)->__toString();
}

As you can see, its just a matter of working the ASCII to images and css and you can very easily have a directory tree anywhere on your site, just taking advantage of the RecursiveDirectoryIterator.

End of Part I…

This is a brief overview of what you can do with all the Directory Iterators available in SPL. Combining these directory iterators with other navigation iterators you can do a lot more, this will be the topic of another post soon, where I will talk about all the different iterators you can use to iterate over iterators (say that 3x fast!) all the way from the FilterIterator to the InfinityIterator. I hope this helps you to get an idea of how to make your code better with SPL code.

1 Star2 Stars3 Stars4 Stars5 Stars (Sem votos registrados)
Loading ... Loading ...

4 Comments »

June 3rd 2009

SPL: a hidden gem

By a show of hands, how many people here ever heard of SPL? How many already used it? Chances are most of you didn’t raise your hands, and some might even have a confused look on their faces. Indeed that is the sad reality when it comes to SPL, but What is SPL?

SPL, or Standard PHP Library, is a set of classes and interfaces built in to PHP since version 5.0, and as of PHP 5.3 it cannot even be disabled, so its here and for good. Its actually hard to disable it when compiling, so 9.9 out of 10 changes that you have it. But why have you not used it? The answer begins at “poor documentation” and ends in “didn’t even know it existed”, SPL has not had the “bling” about that it deserves, but this is where this article comes in, time to turn this around. So what is in SPL?

SPL makes available a few hooks for overloading the PHP Engine, such as ArrayAccess, Countable and SeekableIterator interfaces, to make your objects work like arrays. You can also manipulate other stuuf using RecursiveIterator, ArrayObejcts and various other iterators. It even has classes for specific points such as Exceptions, SplObserver, Spltorage and helper functions to overload other aspects, like spl_autoload_register, spl_classes and iterator_apply. Overall its a swiss army knife of code that can be implemented in PHP but that because of its hooks will probably perform much faster in SPL. So, what can i actually do with it then?

Overloading autoloader

You are a by the book programmer, and after __autoload came around you rewrote all your sites and remove the endless stream os includes and requires in your code to make way for lazy loading, right? So once in a while you found yourself in a jam, you product’s classes use a specific naming/directory structure and the Zend Framework classes you use have a “_” to path approach, how do you solve this? Giant __autoload that includes all logic, trial and error style? Alter you directory structure to Zend’s? No! Overload it!

The process is simple, just create your own autoload function and overload it, that way the autoload procedure will run the class through Zend’s loader, if it does not find a class, it will then run yours, and keep on going down the line until one of them finds it.

    1 <?php
    2
    3 class MyLoader{
    4     public static function doAutoload($class){
    5         //autoload process
    6         //use file_exists please
    7     }
    8 }
    9
   10 spl_autoload_register( array('MyLoader', 'doAutoload') );
   11
   12 ?>

Iterators

Iterator is a design pattern, a generic solution to iterate over data in a consistent manner, a way to access elements of an object in a sequential way without exposing underlying representations. SPL has all the Iterators you ever need, and i’m not exagerating at all. This also includes iteratorfilters and so many other. You can use this for example in you database results, making the DbResult object implement the Iterator interface, thus making functions such as next(), prev() and other available so you can iterate results in a foreach. Another good example for Iterators is transversing a directory. In the usual manner you can iterate over scandir, the use if and elses to skip over “.”, “..” and any other files, say for example you want just the pictures from a directory. You can do all this using iterators and iterator filters, like in this example:

    1 <?php
    2
    3 class RecursiveFileFilterIterator extends FilterIterator
    4 {
    5     protected $ext = array('jpg','gif');
    6
    7     /**
    8     * Takes $path and creates a recursive iterator with a directory iterator
    9     * @param $path diretory to iterate
   10     */
   11     public function __construct($path)
   12     {
   13         parent::__construct(new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path)));
   14     }
   15
   16     /**
   17      * Checks extension names for files only.
   18      */
   19     public function accept()
   20     {
   21         $item = $this->getInnerIterator();
   22         if ($item->isFile() && in_array(pathinfo($item->getFilename(), PATHINFO_EXTENSION), $this->ext)) {
   23             return TRUE;
   24         }
   25     }
   26 }
   27
   28 // Using it
   29 foreach (new RecursiveFileFilterIterator('/path/to/something') as $item) {
   30     echo $item . PHP_EOL;
   31 }
   32
   33 ?>

You may argue that now you have much more code, I’ll reply: yes, but you have reusable and testable code!

Here are some more iterators:

  • RecursiveIterator
  • RecursiveIteratorIterator
  • OuterIterator
  • IteratorIterator
  • FilterIterator
  • RecursiveFilterIterator
  • ParentIterator
  • SeekableIterator
  • LimitIterator
  • GlobIterator
  • CachingIterator
  • RecursiveCachingIterator
  • NoRewindIterator
  • AppendIterator
  • RecursiveIteratorIterator
  • InfiniteIterator
  • RegexIterator
  • RecursiveRegexIterator
  • EmptyIterator
  • RecursiveTreeIterator
  • ArrayIterator

As of PHP 5.3 we have some other interesting tools, like SPLInt and other types you can use for type-casting (in PECL still). One class worth mencioning however is:

SplFixedArray

Why? Its faster! Why? aha! that’s the million dollar question. See to understand that we must dwell into the PHP internals for a regular array. In a regular array you can use diferent types of keys, i.e. numeric, strings and so forth. What PHP does is that it does not use that value as a key in the underlying C array, rather it hashes whatever it gets and uses that as a key, so hashing has a performance cost. SplFixedArray only accepts numeric keys, so no hashing happens! For those of you that cought up, yes, its a C array! So that explains why this is faster than regular arrays. (only php5.3!!)

This are just some examples of what you can do with SPL, unfortunatelly there is no “one place” to go and get a complete view of SPL, tou can hit the regular manual, but you should always trust in this documentation, done by the creators themselves, or you can hit Elizabeth’s Blog, most examples on this article belong to her.

Invitation

But there is no better way to get better at SPL than contributing to it! We need documentators! So if you want to be part of PHP and help out, check out the php.doc mailing list, or IRC your way to EFNet and join #php.doc and say “I want to help”, you will be given a task very fast!

1 Star2 Stars3 Stars4 Stars5 Stars (Sem votos registrados)
Loading ... Loading ...

11 Comments »