Jun 13, 2016
Table of contents:
So far in this introduction to Elixir series we’ve touched upon the Enum
module a couple of times. The Enum
module is a collection of functions that act on enumerable data structures.
The Enum
module is extremely useful, and it will probably be something that you use a lot in your day-to-day Elixir work.
Elixir also has the Stream
module, which like the Enum
module, allows you to act on enumerables in much the same way.
So what is the difference between the two modules, and when should you use one or the other?
In today’s tutorial we are going to be exploring these two very useful modules, and understanding when and where you should use them in your Elixir code.
Before we actually get into the difference between the Enum
module and the Stream
module I’m aware that I shouldn’t automatically assume you know what enumerables are.
Enumerables are data structures that can enumerate. For example in Elixir, lists, maps, and ranges are all enumerable types because you can enumerate the values.
Enumerate simply means take each item at a time and do something with it. For example, iterating through each item in a list.
The Enum
module provides generic functions that can be applied to enumerable data structures.
A couple of the most common Enum
functions you will find yourself using are map
, transform
, sort
, group
, and filter
.
Here are a couple of examples of using the map
function:
# List
Enum.map([1, 2, 3], &(&1 * 2))
# Range
Enum.map(1..3, &(&1 * 2))
# Map
Enum.map(%{1 => 1, 2 => 2, 3 => 3}, fn {k, v} -> v * 2 end)
As you can see, we can use the same Enum.map/2
function with many different types of enumerable data structure in Elixir.
The enumerable data structures that can be used with the Enum
module all implement the Enumerable protocol. We haven’t covered protocols in Elixir just yet, but we will in the coming weeks. You don’t need to worry about this for now.
One of the characteristics of the functions in the Enum
module is eagerness. This means the function will act on the data structure immediately.
For example:
Enum.map([1, 2, 3], &(&1 * 3))
|> Enum.filter(&(rem(&1, 2) == 0))
In this example we pass a list of [1,2,3]
into the map
function and then multiple each value by 3.
This returns a new list (because data in Elixir is immutable) containing the values [3, 6, 9]
.
This list is then passed into the filter
function using the Pipe operator (Using the Pipe Operator in Elixir).
In the filter
function we filter out any odd numbers. This returns a new list that looks like this [6]
.
The important thing to note here is each function acts on the list and produces a new list in isolation. So for this process we iterate through the list twice.
If we were to add another function, we would be iterating through the list for a third time.
This isn’t a big problem with such a small set of data. But once you start working with big data structures this “eager” execution starts to break down.
The Stream
module offers an alternative to the Enum
module for acting on enumerable data structures.
Instead of eagerly acting on the data structure, the Stream
module will create a stream that represents the function, but without actually acting on it straight away.
This is known as lazy, as opposed to eager.
These streams can be composed together in a pipeline and then acted on, greatly reducing the overhead of producing a list after each individual function call.
For example, we could write the example from above using the Stream
module:
Stream.map([1, 2, 3], &(&1 * 3))
|> Enum.filter(&(rem(&1, 2) == 0))
In this example I’ve replaced the first use of the Enum
module with the Stream
module. As you can see, you can switch the two modules without having to change the function that is passed as the second parameter.
After the first stage of the pipeline, instead of passing a new list to the filter
function, a Stream
will be passed instead:
# Stream<[enum: [1, 2, 3], funs: [#Function<30.103178510/1 in Stream.map/2>]]>
This is like a latent enumerable data structure that has the knowledge of the previous stage, but it hasn’t acted upon it yet.
The filter
function accepts the Stream
, which wakes it up and then the data is acted on.
Whilst this is a simple example, it is possible to compose many lazy Stream
function calls together.
For example:
1..1_000_000
|> Stream.map(&(&1 * 2))
|> Stream.filter(&(rem(&1, 2) == 0))
|> Enum.sum()
In this example I create a range from 1 to 1,000,000.
I then create a new Stream
that doubles each number.
I then filter out the odd numbers.
I then add up all of the remaining numbers.
Now instead of creating the intermediate lists after each stage we use the Stream
module and only act on the data structure at the end of the pipeline.
The Enum
module offers a number of really useful functions for acting on enumerable data structures. You will likely find yourself using these functions a lot in your day-to-day programming.
The Enum
is “eager” and so it will act on the data straight away. For most use cases this is perfectly fine.
However, if you find yourself in the situation where you need to make a pipeline of transformations on a large dataset, it will probably be better to use the Stream
module instead.
The Stream
module will lazily act on the data, rather than creating a new list at each step of the process. The Stream
module also has the exact same interface, so you don’t need to significantly rewrite your code.
For the most part you will typically be using the Enum
module, but you will likely come across a situation where the Stream
module is a better choice.