Understanding Concurrency and Parallelism in Elixir

Jul 27, 2016

Table of contents:

What is the difference between concurrency and parallelism?
Why are concurrency and parallelism important?
Running processes in parallel
Collecting the results
Conclusion

A couple of weeks ago we begin the next big section of learning elixir by taking a first look at processes. Processes are the foundation of many of the most attractive characteristics of elixir.

Two of those characteristics are concurrency and parallelism. Processes enable these two characteristics in Elixir.

In today’s tutorial we will be looking at concurrency and parallelism in Elixir.

What is the difference between concurrency and parallelism?

Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. Parallelism is when tasks literally run at the same time, for example on a multicore processor.

Two concurrent things have independent execution contexts, but they won’t always run in parallel. If you run 2 cpu-bound concurrent tasks with one cpu core they won’t run in parallel. Concurrency doesn’t always mean that it will be faster.

In Elixir, processes are separate contexts (isolation) and you can have hundreds of thousands of processes on a single CPU (lightweight). If your computer has multiple cores, Elixir will run processes on each of them in parallel.

So hopefully that make sense. Something can run concurrently, but that doesn’t mean it will be parallel, whereas if something is running in parallel, that means that it is running concurrently.

Why are concurrency and parallelism important?

To illustrate why concurrency and parallelism are important let’s take a look at a typical example that you will find in just about any type of application.

Imagine we need to get the details of a user from a third-party api. We can do that with the following function:

get_request = fn id ->
  :timer.sleep(1000)
  "{user:{id: #{id}}}"
end

This function accepts the user’s id and then we simulate making a GET request by going to sleep for 1 second before returning the response.

If you run this function in iex you will see that it takes 1 second to complete:

get_request.(123)
"{user:{id: 123}}"

What happens if we want to get the details of 10 users? Well we would need to run this function 10 times:

Enum.map(1..10, &get_request.(&1))

Here I’m using the Enum.map function, passing it a range, and then passing each number of the range to my get_request function.

If you run this line of code in iex it will take 10 seconds before returning a list of responses. This is not great as we can’t do anything for 10 seconds whilst the requests are being made.

Instead of making each of these requests sequentially, we could instead make them in parallel. Let’s take a look at doing just that.

Running processes in parallel

In order to run the requests in parallel I’m going to create a new function that will spawn a new process:

async_get_request = fn id ->
  spawn(fn -> IO.puts(get_request.(id)) end)
end

In this function I’m spawning a new process and then passing a function that will call the get_request/1 function and then print the return value to the screen.

If you run this you should see that it returns the pid immediately, and then the return value of the get_request/1 function is printed to the screen after 1 second:

async_get_request.(123)
# PID<0.95.0>
# {user:{id: 123}}

Now that we are running the function in a separate process it is running concurrently. This is because the function is being run in a different context. This is beneficial because we can run the process concurrently and continue with what we were doing without being forced to stop and wait.

We can also run 10 requests in parallel using the async_get_request/1 function:

Enum.map(1..10, &async_get_request.(&1))

If you run the line above in iex you will see the return value of a list of pids immediately, and then after 1 second you will see the return value of each of the 10 calls to the get_request/1 function printed to the screen.

So as you can see, using Elixir processes, we can convert something that initially took 10 seconds to complete into something that completes in 1 second.

This is because each of the processes are running concurrently and in parallel.

Collecting the results

One problem with our code so far is the fact that the results from each request are just printed to the screen. Each request is happening in it’s own context, but ideally we want to collect the results into one list so we can continue working with them in the main process.

In Working with Processes in Elixir we saw that processes can pass messages to each other.

So to collect the results from each process, instead of printing to the screen we need to send a message back with the result of the request:

async_get_request = fn id ->
  caller = self

  spawn(fn ->
    send(caller, {:result, get_request.(id)})
  end)
end

In this updated version of the async_get_request/1 function I’m sending the result of the get_request/1 back to the caller. First I save a copy of the current process’ pid using the self/0 function. When the caller variable is passed into the function it is deep copied from the calling process.

To send the result back I’m using the send/2 function and passing the result in a tuple so it can be recognised in the receive block on the calling process.

Now if you run the function again and then use the flush/0 function to list and empty the mailbox, you should see that the messages have been sent back with the result tuples correctly!

Enum.map(1..10, &async_get_request.(&1))
# [PID<0.70.0>, #PID<0.71.0>, #PID<0.72.0>, #PID<0.73.0>, #PID<0.74.0>,
# PID<0.75.0>, #PID<0.76.0>, #PID<0.77.0>, #PID<0.78.0>, #PID<0.79.0>]

flush
# {:result, "{user:{id: 1}}"}
# {:result, "{user:{id: 2}}"}
# {:result, "{user:{id: 3}}"}
# {:result, "{user:{id: 4}}"}
# {:result, "{user:{id: 5}}"}
# {:result, "{user:{id: 6}}"}
# {:result, "{user:{id: 7}}"}
# {:result, "{user:{id: 8}}"}
# {:result, "{user:{id: 9}}"}
# {:result, "{user:{id: 10}}"}

Now that we can see that it’s working we need a way to collect the results from the mailbox. To do that we can write a function that matches for the :result atom in the message tuple.

get_result = fn ->
  receive do
    {:result, result} -> result
  end
end

This function will pattern match for the atom and then return the result.

And now we can put the whole thing together:

1..10
|> Enum.map(&async_get_request.(&1))
|> Enum.map(fn _ -> get_result.() end)

This will create a range and then map over it calling the async_get_request/1, and then we map over the results and call the get_result/0 function. This works because each function will be called 10 times because each time the list is enumerated it has 10 items to iterate over.

Conclusion

Concurrency and parallelism are both very important topics and so I hope today’s tutorial gave a good illustration of their characteristics and how they are different from each other.

Elixir and Erlang have a number of characteristics that make them desirable languages to write highly available, fault-tolerant, and distributed applications.

Processes allow you to write concurrent and parallel code. This is because processes are very lightweight, they run in isolation, and then can make the most of a multi-core processor.

One of the beautiful things about Elixir and Erlang is that you get all of this for free. The Erlang Virtual Machine just deals with this for you because Erlang was specifically written to have these characteristics.