Home > Uncategorized > Tricks with iterators.

Tricks with iterators.

This post is a follow up to Iterators and Iterables Clarified. If you’re not sure how iterables and iterators differ, how to create them, or why you’d care, start there.

OK, so files are iterators which can be exhausted. Once you’ve looped over a file, it’s done. But let’s say I want to implement a file object that can be restarted. There are a couple things you can do: Probably the simplest technique is that you can create a wrapper around a file object that rewinds the file each time it gets iterated over:

A rewinding file iterable


#/usr/bin/env python

class RewindingFileIterable(file):
def __iter__(self):
self.seek(0)
return self

>>> f = RewindingFileIterable(‘names.txt’)
>>> for __ in xrange(2):
>>> for line in f:
>>> print line,
Tom
Dick
Muhammad
Tom
Dick
Muhammad

Unfortunately, this technique has two problems.

First, it doesn’t work as we’d like if we nest loops. Since there is only one file object, each time we hit a loop, it gets rewound for all the loops, and each time we get to the end of the file, we get a StopIteration for every instance.


>>> f = RewindingFileIterable('names.txt')
>>> for line_x in f:
... for line_y in f:
... print line_x, line_y,
...
Tom
Tom
Tom
Dick
Tom
Muhammad

Once we reach the end of the inside iterator, we are also at the end of the outside iterator (because it is the same RewindingFileIterable).

The other problem is that it also relies on the fact that the file object is technically a “broken” iterator. According to the documentation:

The intention of the protocol is that once an iterator’s next() method raises StopIteration, it will continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.

So once we reach the end of a file, it is technically supposed to keep raising StopIterations forever. However, if you rewind the file (using seek), it will happily start iterating again:


>>> f = open('names.txt')
>>> for line in f:
... pass
>>> print f.next() # we're at the end of the file
StopIteration
>>> f.seek(0) # But if we rewind the file
>>> print f.next(), # It lets us keep iterating
Tom
>>>

A respawning file iterable
It makes me nervous when using aspects of python that aren’t playing by the rules, so let’s try doing this a different way:


#/usr/bin/env python

class RespawningFileIterable(object):
def __init__(self, name, *args, **kwargs):
self.name = name
self.args = args
self.kwargs = kwargs

def __iter__(self):
return open(self.name, *args, **kwargs)

Each time you loop over this iterable (or call iter() on it), you get back a new file object as your iterator, which you can iterate through, and when it is exhausted, you can throw it away, knowing that next time you use your RespawningFileIterable, you’ll get a new file object to iterate over. This gives us the behavior we want when we nest loops:


>>> f = RespawningFileIterator('names.txt')
>>> for line_x in f:
... for line_y in f:
... print line_x, line_y
Tom Tom
Tom Dick
Tom Muhammad
Dick Tom
Dick Dick
Dick Muhammad
Muhammad Tom
Muhammad Dick
Muhammad Muhammad

One drawback here is that now our iterable is not itself a file object. You cannot call its tell() method to find out where you are in the file. You cannot call read() to get the rest of the file. All you can do is use it for iteration, but it does that just fine.

Is there a hybrid version?

So how can we get this restarting capability without losing the ability to use our iterable as a file object? You might want to loop over a file partway, and then find out how far into the file you are using tell().

Unfortunately, once you have created an object that can be iterated over in multiple contexts at once, it doesn’t make sense to be able to ask where you are in the file, because you could be in two places at once.

Because of this ambiguity, it is best to leave such reporting to the iterators themselves, but this takes a bit of care to make it work. In the loop constructs we have created, we don’t actually have access to those iterators, so we will have to rewrite our loops to make them explicit.


>>> f = SpawningFileIterable('names.txt', buffering=0)
>>> outer_iterator = iter(f)
>>> for line_x in outer_iterator:
... inner_iterator = iter(f)
... for line_y in inner_iterator:
... print outer_iterator.tell(), line_x,
... print inner_iterator.tell(), line_y,
18 Tom
18 Tom
18 Tom
18 Dick
18 Tom
18 Muhammad
18 Dick
18 Tom
18 Dick
18 Dick
18 Dick
18 Muhammad
18 Muhammad
18 Tom
18 Muhammad
18 Dick
18 Muhammad
18 Muhammad

Oops. It looks like the file iterator reads in data in chunks, so with a file this small the position is already 18 (the end of the file) after its first read operation. In a larger file, though, this will tell you how many bytes you’ve read in for each file.

At any rate, the important part is that we can use the fact that iterators return themselves from their __iter__() method to get explicit access to the iterator in use in a for loop, which gives us the ability to fine-tune what is happening during that loop.

Advertisements
Categories: Uncategorized
  1. No comments yet.
  1. 2010/02/26 at 5:27 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: