Jul 27, 2016
Table of contents:
A couple of weeks ago we begin the next big section of learning elixir by taking a first look at processes. Processes are the foundation of many of the most attractive characteristics of elixir.
Two of those characteristics are concurrency and parallelism. Processes enable these two characteristics in Elixir.
In today’s tutorial we will be looking at concurrency and parallelism in Elixir.
Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. Parallelism is when tasks literally run at the same time, for example on a multicore processor.
Two concurrent things have independent execution contexts, but they won’t always run in parallel. If you run 2 cpu-bound concurrent tasks with one cpu core they won’t run in parallel. Concurrency doesn’t always mean that it will be faster.
In Elixir, processes are separate contexts (isolation) and you can have hundreds of thousands of processes on a single CPU (lightweight). If your computer has multiple cores, Elixir will run processes on each of them in parallel.
So hopefully that make sense. Something can run concurrently, but that doesn’t mean it will be parallel, whereas if something is running in parallel, that means that it is running concurrently.
To illustrate why concurrency and parallelism are important let’s take a look at a typical example that you will find in just about any type of application.
Imagine we need to get the details of a user from a third-party api. We can do that with the following function:
get_request = fn id ->
:timer.sleep(1000)
"{user:{id: #{id}}}"
end
This function accepts the user’s id and then we simulate making a GET
request by going to sleep for 1 second before returning the response.
If you run this function in iex
you will see that it takes 1 second to complete:
get_request.(123)
"{user:{id: 123}}"
What happens if we want to get the details of 10 users? Well we would need to run this function 10 times:
Enum.map(1..10, &get_request.(&1))
Here I’m using the Enum.map
function, passing it a range, and then passing each number of the range to my get_request
function.
If you run this line of code in iex
it will take 10 seconds before returning a list of responses. This is not great as we can’t do anything for 10 seconds whilst the requests are being made.
Instead of making each of these requests sequentially, we could instead make them in parallel. Let’s take a look at doing just that.
In order to run the requests in parallel I’m going to create a new function that will spawn a new process:
async_get_request = fn id ->
spawn(fn -> IO.puts(get_request.(id)) end)
end
In this function I’m spawning a new process and then passing a function that will call the get_request/1
function and then print the return value to the screen.
If you run this you should see that it returns the pid immediately, and then the return value of the get_request/1
function is printed to the screen after 1 second:
async_get_request.(123)
# PID<0.95.0>
# {user:{id: 123}}
Now that we are running the function in a separate process it is running concurrently. This is because the function is being run in a different context. This is beneficial because we can run the process concurrently and continue with what we were doing without being forced to stop and wait.
We can also run 10 requests in parallel using the async_get_request/1
function:
Enum.map(1..10, &async_get_request.(&1))
If you run the line above in iex
you will see the return value of a list of pids immediately, and then after 1 second you will see the return value of each of the 10 calls to the get_request/1
function printed to the screen.
So as you can see, using Elixir processes, we can convert something that initially took 10 seconds to complete into something that completes in 1 second.
This is because each of the processes are running concurrently and in parallel.
One problem with our code so far is the fact that the results from each request are just printed to the screen. Each request is happening in it’s own context, but ideally we want to collect the results into one list so we can continue working with them in the main process.
In Working with Processes in Elixir we saw that processes can pass messages to each other.
So to collect the results from each process, instead of printing to the screen we need to send a message back with the result of the request:
async_get_request = fn id ->
caller = self
spawn(fn ->
send(caller, {:result, get_request.(id)})
end)
end
In this updated version of the async_get_request/1
function I’m sending the result of the get_request/1
back to the caller. First I save a copy of the current process’ pid using the self/0
function. When the caller
variable is passed into the function it is deep copied from the calling process.
To send the result back I’m using the send/2
function and passing the result in a tuple so it can be recognised in the receive
block on the calling process.
Now if you run the function again and then use the flush/0
function to list and empty the mailbox, you should see that the messages have been sent back with the result tuples correctly!
Enum.map(1..10, &async_get_request.(&1))
# [PID<0.70.0>, #PID<0.71.0>, #PID<0.72.0>, #PID<0.73.0>, #PID<0.74.0>,
# PID<0.75.0>, #PID<0.76.0>, #PID<0.77.0>, #PID<0.78.0>, #PID<0.79.0>]
flush
# {:result, "{user:{id: 1}}"}
# {:result, "{user:{id: 2}}"}
# {:result, "{user:{id: 3}}"}
# {:result, "{user:{id: 4}}"}
# {:result, "{user:{id: 5}}"}
# {:result, "{user:{id: 6}}"}
# {:result, "{user:{id: 7}}"}
# {:result, "{user:{id: 8}}"}
# {:result, "{user:{id: 9}}"}
# {:result, "{user:{id: 10}}"}
Now that we can see that it’s working we need a way to collect the results from the mailbox. To do that we can write a function that matches for the :result
atom in the message tuple.
get_result = fn ->
receive do
{:result, result} -> result
end
end
This function will pattern match for the atom and then return the result.
And now we can put the whole thing together:
1..10
|> Enum.map(&async_get_request.(&1))
|> Enum.map(fn _ -> get_result.() end)
This will create a range and then map over it calling the async_get_request/1
, and then we map over the results and call the get_result/0
function. This works because each function will be called 10 times because each time the list is enumerated it has 10 items to iterate over.
Concurrency and parallelism are both very important topics and so I hope today’s tutorial gave a good illustration of their characteristics and how they are different from each other.
Elixir and Erlang have a number of characteristics that make them desirable languages to write highly available, fault-tolerant, and distributed applications.
Processes allow you to write concurrent and parallel code. This is because processes are very lightweight, they run in isolation, and then can make the most of a multi-core processor.
One of the beautiful things about Elixir and Erlang is that you get all of this for free. The Erlang Virtual Machine just deals with this for you because Erlang was specifically written to have these characteristics.