Elixir Benchmarking

A couple of days ago I faced a straight forward problem:

Given a list of elements The value of index 2 is a count Starting with the element after the count Return a list of lists each with 3 elements (triplet)

Additional note: the count will be >= 48

elements = [ "skip", "skip", 4, "list_1_element_1", "list_1_element_2", "list_1_element_3", "list_2_element_1", "list_2_element_2", "list_2_element_3", "list_3_element_1", "list_3_element_2", "list_3_element_3", "list_4_element_1", "list_4_element_2", "list_4_element_3"]

output = [
  [ "list_1_element_1", "list_1_element_2", "list_1_element_3" ],
  [ "list_2_element_1", "list_2_element_2", "list_2_element_3" ],
  [ "list_3_element_1", "list_3_element_2", "list_3_element_3" ],
  [ "list_4_element_1", "list_4_element_2", "list_4_element_3" ]
]

2 ways I thought to solve this:

Iteration Link to heading

This route would be something like:

create a range of integers 0 to count
iterate over the range (step)
calculate the working index ((step * 3) + count + 1)
slice the list from the working index for the next 3 elements

  @count_index 2

  def iterate(list) do
    count = Enum.(list, @count_index)
    range = (0..(count - 1))

    Enum.reduce(range, [], fn(iteration, acc) ->
      triplet = find_triplet(list, iteration)
      [triplet|acc]
    end)
    |> Enum.reverse()
  end

  defp find_triplet(list, 0) do
    Enum.slice(list, (@count_index + 1), 3)
  end
  defp find_triplet(list, iteration_number) do
    Enum.slice(list, ((iteration_number * 3) + (@count_index + 1)) , 3)
  end

Chunk It Link to heading

This route would be something like:

Drop all elements of the list upto the count + 1
Enum.chunk_every/2

  @count_index 2

  def chunk(list) do
    list
    |> Enum.drop((@count_index + 1))
    |> Enum.chunk_every(3)
  end

From a readability standpoint there is a clear winner in my mind: Chunk it. It’s easy to read, and I can determine what it is going to do at a glance, but is there a tradeoff of memory usage or time?

Method to create some test lists for benchmarking:

def create_list(n) do
 elems = (1..n)
 |> Enum.flat_map(fn(_x) -> ["elem1","elem2","elem3"] end)

 ["skip", "skip", n | elems]
end

Benchmarking with Benchee Link to heading

When looking for the elixir equivalent of ruby’s benchmark I came across Benchee. So using that let’s see if there are any tradeoffs when using Chunk it vs Iterate. Since benchee can accept and inputs we can do a comparison with sizes.

run(
  %{
    "iterate" => fn(input) -> iterate(input) end,
    "chunk" => fn(input) -> chunk(input) end
  },
  inputs: %{
    "small" => create_list(1),
    "Medium" => create_list(24),
    "Biggest" => create_list(48)
  },
  time: 10,
  memory_time: 2
)

$ mix run lib/triplet_finder/benchmark.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
parallel: 1
inputs: Biggest, Medium, small
Estimated total run time: 1.40 min

Benchmarking chunk with input Biggest...
Benchmarking chunk with input Medium...
Benchmarking chunk with input small...
Benchmarking iterate with input Biggest...
Benchmarking iterate with input Medium...
Benchmarking iterate with input small...

##### With input Biggest #####
Name              ips        average  deviation         median         99th %
chunk         34.66 K       28.85 μs    ±33.28%       27.90 μs       55.90 μs
iterate       16.52 K       60.52 μs    ±46.68%       58.90 μs       96.90 μs

Comparison:
chunk         34.66 K
iterate       16.52 K - 2.10x slower +31.67 μs

Memory usage statistics:

Name       Memory usage
chunk          20.13 KB
iterate         3.15 KB - 0.16x memory usage -16.98438 KB

**All measurements for memory usage were the same**

##### With input Medium #####
Name              ips        average  deviation         median         99th %
chunk         67.45 K       14.83 μs    ±77.46%       13.90 μs       32.90 μs
iterate       53.71 K       18.62 μs    ±78.24%       17.90 μs       32.90 μs

Comparison:
chunk         67.45 K
iterate       53.71 K - 1.26x slower +3.79 μs

Memory usage statistics:

Name       Memory usage
chunk           9.95 KB
iterate         1.76 KB - 0.18x memory usage -8.18750 KB

**All measurements for memory usage were the same**

##### With input small #####
Name              ips        average  deviation         median         99th %
iterate        1.53 M        0.65 μs  ±6120.45%        0.90 μs        0.90 μs
chunk          0.74 M        1.35 μs  ±3069.60%        0.90 μs        1.90 μs

Comparison:
iterate        1.53 M
chunk          0.74 M - 2.07x slower +0.70 μs

Memory usage statistics:

Name       Memory usage
iterate           184 B
chunk             672 B - 3.65x memory usage +488 B

**All measurements for memory usage were the same**

Benchmarking Summary:

	small	medium	large
Faster	Iterate	Chunk	Chunk
Memory	Iterate	Chunk	Chunk

Conclusion Link to heading

So with this comparison in mind, I am in a better place to determine which thought process would be the most beneficial.

Chunk It is both easier to read and for medium/large use cases it is the more efficient option, but for small use cases, there is a trade-off efficiency for readability.

There will always be more than one way to solve a problem, but deciding which is the most efficient may not always be obvious. Benchee makes it simple to compare the solutions’ tradeoffs and improves my confidence in which solution I decide to implement. When creating a pull request I enjoy being able to demonstrate why I chose to implement a given solution.A couple of days ago I faced a straight forward problem:

Given a list of elements The value of index 2 is a count Starting with the element after the count Return a list of lists each with 3 elements (triplet)

Additional note: the count will be >= 48

elements = [ "skip", "skip", 4, "list_1_element_1", "list_1_element_2", "list_1_element_3", "list_2_element_1", "list_2_element_2", "list_2_element_3", "list_3_element_1", "list_3_element_2", "list_3_element_3", "list_4_element_1", "list_4_element_2", "list_4_element_3"]

output = [
  [ "list_1_element_1", "list_1_element_2", "list_1_element_3" ],
  [ "list_2_element_1", "list_2_element_2", "list_2_element_3" ],
  [ "list_3_element_1", "list_3_element_2", "list_3_element_3" ],
  [ "list_4_element_1", "list_4_element_2", "list_4_element_3" ]
]

2 ways I thought to solve this:

Iteration Link to heading

This route would be something like:

create a range of integers 0 to count
iterate over the range (step)
calculate the working index ((step * 3) + count + 1)
slice the list from the working index for the next 3 elements

  @count_index 2

  def iterate(list) do
    count = Enum.(list, @count_index)
    range = (0..(count - 1))

    Enum.reduce(range, [], fn(iteration, acc) ->
      triplet = find_triplet(list, iteration)
      [triplet|acc]
    end)
    |> Enum.reverse()
  end

  defp find_triplet(list, 0) do
    Enum.slice(list, (@count_index + 1), 3)
  end
  defp find_triplet(list, iteration_number) do
    Enum.slice(list, ((iteration_number * 3) + (@count_index + 1)) , 3)
  end

Chunk It Link to heading

This route would be something like:

Drop all elements of the list upto the count + 1
Enum.chunk_every/2

  @count_index 2

  def chunk(list) do
    list
    |> Enum.drop((@count_index + 1))
    |> Enum.chunk_every(3)
  end

Method to create some test lists for benchmarking:

def create_list(n) do
 elems = (1..n)
 |> Enum.flat_map(fn(_x) -> ["elem1","elem2","elem3"] end)

 ["skip", "skip", n | elems]
end

Benchmarking with Benchee Link to heading

run(
  %{
    "iterate" => fn(input) -> iterate(input) end,
    "chunk" => fn(input) -> chunk(input) end
  },
  inputs: %{
    "small" => create_list(1),
    "Medium" => create_list(24),
    "Biggest" => create_list(48)
  },
  time: 10,
  memory_time: 2
)

$ mix run lib/triplet_finder/benchmark.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
parallel: 1
inputs: Biggest, Medium, small
Estimated total run time: 1.40 min

Benchmarking chunk with input Biggest...
Benchmarking chunk with input Medium...
Benchmarking chunk with input small...
Benchmarking iterate with input Biggest...
Benchmarking iterate with input Medium...
Benchmarking iterate with input small...

##### With input Biggest #####
Name              ips        average  deviation         median         99th %
chunk         34.66 K       28.85 μs    ±33.28%       27.90 μs       55.90 μs
iterate       16.52 K       60.52 μs    ±46.68%       58.90 μs       96.90 μs

Comparison:
chunk         34.66 K
iterate       16.52 K - 2.10x slower +31.67 μs

Memory usage statistics:

Name       Memory usage
chunk          20.13 KB
iterate         3.15 KB - 0.16x memory usage -16.98438 KB

**All measurements for memory usage were the same**

##### With input Medium #####
Name              ips        average  deviation         median         99th %
chunk         67.45 K       14.83 μs    ±77.46%       13.90 μs       32.90 μs
iterate       53.71 K       18.62 μs    ±78.24%       17.90 μs       32.90 μs

Comparison:
chunk         67.45 K
iterate       53.71 K - 1.26x slower +3.79 μs

Memory usage statistics:

Name       Memory usage
chunk           9.95 KB
iterate         1.76 KB - 0.18x memory usage -8.18750 KB

**All measurements for memory usage were the same**

##### With input small #####
Name              ips        average  deviation         median         99th %
iterate        1.53 M        0.65 μs  ±6120.45%        0.90 μs        0.90 μs
chunk          0.74 M        1.35 μs  ±3069.60%        0.90 μs        1.90 μs

Comparison:
iterate        1.53 M
chunk          0.74 M - 2.07x slower +0.70 μs

Memory usage statistics:

Name       Memory usage
iterate           184 B
chunk             672 B - 3.65x memory usage +488 B

**All measurements for memory usage were the same**

Benchmarking Summary:

	small	medium	large
Faster	Iterate	Chunk	Chunk
Memory	Iterate	Chunk	Chunk

Conclusion Link to heading

So with this comparison in mind, I am in a better place to determine which thought process would be the most beneficial.

Chunk It is both easier to read and for medium/large use cases it is the more efficient option, but for small use cases, there is a trade-off efficiency for readability.