, Johann Schmitz

Python provides a simple way to implement behaviour to builtin methods for custom classes. You just implement some of the various magic methods and you're good to go.

Here is an example on how to leverage these functionality to build a custom sequence.

import os

class DirectoryIndex(object):

    def __init__(self, path):
        self.items = os.listdir(path)

Of course, in a real application you wouldn't wrap a simple os.listdir in it's own class and you wouldn't do the feching in the __init__ method. Your business object may do a lot of more work like fetching data from external ressources, etc. You can use an object of the class by explicit using the items field:

di = DirectoryIndex('/etc')
print "%s: %s" % (len(di.items), di.items)

Since every caller needs to know that we store our result in the items field (which isn't pythonic and clearly violates the DRY principle, we have no control over who use it and how do they use it. Thus, we cannot optimize the access to the underlying ressource which might be costly to access. Furthermore, the caller can (and someone will) modifiy the items field which makes the value of the list unreliable even for the containing DirectoryIndex class.

Here is a better version:

class DirectoryIndex(object):

    def __init__(self, path):
        self._path = path
        self._items = os.listdir(path)

    def __len__(self):
        return len(self._items)

    def __iter__(self):
        return iter(self._items)

    def __repr__(self):
        return "<DirectoryIndex of %s>" % self._path


di = DirectoryIndex('/etc')
print di 
print len(di)

<DirectoryIndex of /etc>
195

In this version, we implemented two methods that makes the class actually useful: __len__ and __iter__. The former returns the length of the sequence, the latter provides an iterator which is used in iterating constructs like for ... in. Both methods can contain arbitary code required to return a meaningful value, except that __iter__ must return an iterator (so we cannot just return self.items but have to wrap it in an iter). For example, in implementation of a custom sequence backed by a database may issue a SELECT COUNT(*) FROM ... WHERE ... sql statement to return a value for __len__.

But we can improve this class even further. By providing an implementation for __iter__, we get many built-in features for free:

di = DirectoryIndex('/etc')
print 'sudoers' in di
True
print 'foobar' in di
False

di = DirectoryIndex('/tmp/emptydir')
if di:
    print True
else:
    print False
False

However, the default implementation for the other methods of sequences (like the __bool__ method or __contains__ are using the __iter__ method to create the return value, which causes a evaluation of the whole iterator. For example, a 'sudoers' in di causes a looping through __iter__ to tell whether the value is in the list or not. But we can implement a few more methods to make that better:

class DirectoryIndex(object):
    # ... other methods ...

    def __contains__(self, item):
        return os.path.exists(os.path.join(self._path, item))

    def __getitem__(self, key):
        return self._items[key.start:key.stop:key.step]

The __contains__ method uses a shortcut to test if a file exists (and avoids evaluating the whole iterator). The __getitem__ method is used by the slicing operator (sth. like di[10:20]). The key parameter is an instance of slice which contains properties for start, stop and step. In our example, we simply pass these values to our in-memory list but again you are free to use these values to return only a subset of the items from your backend (e.g. via SELECT * FROM ... LIMIT x OFFSET y).

Conclusion

Python provides a simple yet powerful way to simply turn your classes into sequences which can be used wherever an iterable object needs to be passed around. There are a few other methods (see emulating container types in the python documentation) which can be implemented to make some operations more efficient.