Categories
Python

Python: Iterators

If you have written some Python code and used the for loop, you have already used iterators behind the scene but you probably didn’t know about it. Iterators are objects that we can iterate over one by one. They are practically everywhere in a Python codebase. Understanding the concepts of iterators and how they work can […]

If you have written some Python code and used the for loop, you have already used iterators behind the scene but you probably didn’t know about it. Iterators are objects that we can iterate over one by one. They are practically everywhere in a Python codebase. Understanding the concepts of iterators and how they work can help us write better, more efficient code from time to time. In this post, we will discuss iterators and other related concepts.

How does iteration work?

Before we can dive into iterators, we first need to understand how iteration works in Python. When we do the for loop, how does Python fetch one item at a time? How does this process work?

There are two functions that come into play – iter and next. The iter function gets an iterator from an object. It actually calls the __iter__ special method on the object to get the iterator. So if an object wants to allow iteration, it has to implement the __iter__ method. Once it gets the iterator object, it continues to call next on the iterator. The next function in turn calls the __next__ method on the iterator object. Let’s see a quick example:

Let’s see. We first create a list named l with 3 elements. We then call iter() on it. The type of l is list but look at the type of i – it’s list_iterator – interesting! Now we keep calling next on i and it keeps giving us the values we saw in the list, one by one, until there’s a StopIteration exception.

Here the list is an iterable because we can get an iterator from it to iterate over the list. The list_iterator object we got is an iterator, it’s an object that we can actually iterate over. When we loop over a list, this is what happens:

Makes sens? The for loop actually gets the iterator and keeps looping over until a StopIteration exception is encountered.

Iterator

The iterator is an object which implements __next__ method so we can call next on it repeatedly to get the items. Let’s write an iterator that keeps us giving us the next integer, without ever stopping. Let’s name it InfiniteIterator.

If we keep calling next on it, we will keep getting the integers, starting from one.

Iterable

What if we wanted to create an InfiniteNumbers iterable? It would be such that when we use the for loop on it, it never stops. It keeps producing the next integer in each loop. What would we do? Well, we have an InfiniteIterator. All we need is to define an __iter__ method that returns a new instance of InfiniteIterator.

If you remove the break statement and the if block, you will notice, it keeps running – like forever.

Using StopIteration

Instead of breaking out from our code ourselves, we could use the StopIteration exception in our iterator so it stops after giving us the 100 numbers.

Iterators must also implement __iter__

We saw that the __next__ method does it’s work just fine. But we also need to implement the __iter__ method on an iterator (just like we did in iterable). Why is this required? Let me quote from the official docs:

Iterators are required to have an__iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted.

If we tried to use the for loop over our iterator, it would fail:

We will get the following exception:

That kind of makes sense because we saw that the for loop runs the iter function on an object to get an iterator from it. Then calls next on the iterator. That’s the problem, we don’t have an __iter__ method. The official documentation suggests that every iterator should be a proper iterable too. That is, it should implement the __iter__ method and just return an instance of itself. Let’s do that:

Now the code works fine 🙂

The Iterator Protocol

The iterator protocol defines the special methods that an object must implement to allow iteration. We can summarize the protocol in this way:

  • Any object that can be iterated over needs to implement the __iter__ method which should return an iterator object. Any object that returns an iterator is an iterable.
  • An iterator must implement the __next__ method which returns the next item when called. When all items are exhausted (read retrieved), it must raise the StopIteration exception.
  • An iterator must also implement the __iter__ method to behave like an iterable.

Why do we need Iterables?

In our last example, we saw that it’s possible for an object to implement a __next__ method and an __iter__ method that returns self. In this way, an iterator behaves just like an iterable alright. Then why do we need Iterables? Why can’t we just keep using Iterators which refer to itself?

Let’s get back to our HundredIterator example. Once you have iterated over the items once, try to iterate again. What happens? No numbers are output on the screen. Why? Well, because the iterator objects store “state”. Once it has reached StopIteration, it has reached the end line. It’s now exhausted. Every time you call iter on it, it returns the same instace (self) which has nothing more to output.

This is why Iterables are useful. You can just return a fresh instance of an iterator every time the iterable is looped over. This is actually what many built in types like list does.

Why is Iterators so important?

Iterators allow us to consume data each item at a time. Just imagine, if there’s a one GB file and we tried to load it all in memory, it would require huge memory. But what if we implemented an iterator that reads the file one line at a time? We could then just store that one line in memory and do necessary processing before moving on to the next item. This allow us to write really efficient programs 🙂

This all seems very confusing

If you find the concepts very confusing and hard to grasp, don’t worry. Give it a few tries, write the codes by hand and see the output. Tinker with the examples. Inspect the code, try to see what happens when you modify part of it. All things become easier when you practise more and more. Try writing your own iterables and iterators – perhaps try to clone the built in containers’ functionalities? May be write your own list implementation? Don’t worry, it will come to you in time.