If you have written some Python code and used the for loop, you have already used iterators behind the scene but you probably didn’t know about it. Iterators are objects that we can iterate over one by one. They are practically everywhere in a Python codebase. Understanding the concepts of iterators and how they work can help us write better, more efficient code from time to time. In this post, we will discuss iterators and other related concepts.
How does iteration work?
Before we can dive into iterators, we first need to understand how iteration works in Python. When we do the
for loop, how does Python fetch one item at a time? How does this process work?
There are two functions that come into play –
iter function gets an iterator from an object. It actually calls the
__iter__ special method on the object to get the iterator. So if an object wants to allow iteration, it has to implement the
__iter__ method. Once it gets the iterator object, it continues to call
next on the iterator. The
next function in turn calls the
__next__ method on the iterator object. Let’s see a quick example:
>>> l = [1, 2, 3] >>> i = iter(l) >>> type(l) <class 'list'> >>> type(i) <class 'list_iterator'> >>> next(i) 1 >>> next(i) 2 >>> next(i) 3 >>> next(i) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration >>>
Let’s see. We first create a list named
l with 3 elements. We then call
iter() on it. The type of
list but look at the type of
i – it’s
list_iterator – interesting! Now we keep calling
i and it keeps giving us the values we saw in the list, one by one, until there’s a
Here the list is an iterable because we can get an iterator from it to iterate over the list. The
list_iterator object we got is an iterator, it’s an object that we can actually iterate over. When we loop over a list, this is what happens:
l = [1, 2, 3] iterator = iter(l) while True: try: item = next(iterator) print(item) except StopIteration: break
Makes sens? The for loop actually gets the iterator and keeps looping over until a
StopIteration exception is encountered.
The iterator is an object which implements
__next__ method so we can call
next on it repeatedly to get the items. Let’s write an iterator that keeps us giving us the next integer, without ever stopping. Let’s name it
class InfiniteIterator: def __init__(self): self.__int = 0 def __next__(self): self.__int += 1 return self.__int
If we keep calling
next on it, we will keep getting the integers, starting from one.
>>> inf_iter = InfiniteIterator() >>> next(inf_iter) 1 >>> next(inf_iter) 2 >>> next(inf_iter) 3 >>> next(inf_iter) 4 >>>
What if we wanted to create an
InfiniteNumbers iterable? It would be such that when we use the for loop on it, it never stops. It keeps producing the next integer in each loop. What would we do? Well, we have an
InfiniteIterator. All we need is to define an
__iter__ method that returns a new instance of
class InfiniteNumbers: def __iter__(self): return InfiniteIterator() infinite_numbers = InfiniteNumbers() for x in infinite_numbers: print(x) if x > 99: break
If you remove the
break statement and the if block, you will notice, it keeps running – like forever.
Instead of breaking out from our code ourselves, we could use the
StopIteration exception in our iterator so it stops after giving us the 100 numbers.
class HundredIterator: def __init__(self): self.__int = 0 def __next__(self): if self.__int > 99: raise StopIteration self.__int += 1 return self.__int class InfiniteNumbers: def __iter__(self): return HundredIterator() one_hundred = InfiniteNumbers() for x in one_hundred: print(x)
Iterators must also implement __iter__
We saw that the
__next__ method does it’s work just fine. But we also need to implement the
__iter__ method on an iterator (just like we did in iterable). Why is this required? Let me quote from the official docs:
Iterators are required to have an
__iter__()method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted.
If we tried to use the for loop over our iterator, it would fail:
class HundredIterator: def __init__(self): self.__int = 0 def __next__(self): if self.__int > 99: raise StopIteration self.__int += 1 return self.__int one_hundred = HundredIterator() for x in one_hundred: print(x)
We will get the following exception:
Traceback (most recent call last): File "iter.py", line 15, in <module> for x in one_hundred: TypeError: 'HundredIterator' object is not iterable
That kind of makes sense because we saw that the for loop runs the
iter function on an object to get an iterator from it. Then calls
next on the iterator. That’s the problem, we don’t have an
__iter__ method. The official documentation suggests that every iterator should be a proper iterable too. That is, it should implement the
__iter__ method and just return an instance of itself. Let’s do that:
class HundredIterator: def __init__(self): self.__int = 0 def __iter__(self): return self def __next__(self): if self.__int > 99: raise StopIteration self.__int += 1 return self.__int one_hundred = HundredIterator() for x in one_hundred: print(x)
Now the code works fine 🙂
The Iterator Protocol
The iterator protocol defines the special methods that an object must implement to allow iteration. We can summarize the protocol in this way:
- Any object that can be iterated over needs to implement the
__iter__method which should return an iterator object. Any object that returns an iterator is an iterable.
- An iterator must implement the
__next__method which returns the next item when called. When all items are exhausted (read retrieved), it must raise the
- An iterator must also implement the
__iter__method to behave like an iterable.
Why do we need Iterables?
In our last example, we saw that it’s possible for an object to implement a
__next__ method and an
__iter__ method that returns
self. In this way, an iterator behaves just like an iterable alright. Then why do we need Iterables? Why can’t we just keep using Iterators which refer to itself?
Let’s get back to our
HundredIterator example. Once you have iterated over the items once, try to iterate again. What happens? No numbers are output on the screen. Why? Well, because the iterator objects store “state”. Once it has reached
StopIteration, it has reached the end line. It’s now exhausted. Every time you call
iter on it, it returns the same instace (
self) which has nothing more to output.
This is why Iterables are useful. You can just return a fresh instance of an iterator every time the iterable is looped over. This is actually what many built in types like
Why is Iterators so important?
Iterators allow us to consume data each item at a time. Just imagine, if there’s a one GB file and we tried to load it all in memory, it would require huge memory. But what if we implemented an iterator that reads the file one line at a time? We could then just store that one line in memory and do necessary processing before moving on to the next item. This allow us to write really efficient programs 🙂
This all seems very confusing
If you find the concepts very confusing and hard to grasp, don’t worry. Give it a few tries, write the codes by hand and see the output. Tinker with the examples. Inspect the code, try to see what happens when you modify part of it. All things become easier when you practise more and more. Try writing your own iterables and iterators – perhaps try to clone the built in containers’ functionalities? May be write your own list implementation? Don’t worry, it will come to you in time.