Python On Fire — Several Ways For Speeding Up Your Code

8 min readMar 3, 2023

Hello fellow Coders!

Today we will compare several different methods to square the elements of a list in Python, focusing on performance.

For-Loop
List Comprehension
The map() function
numpy — np.power()
numba

Measure The Performance of a Function

First, we have a look at how to measure the performance of a given function. We import the timeit() method from the timeit library. This library allows us to measure the execution time of small bits of code, so it is well suited to comparing the performance of different methods.

from timeit import timeit

def test_function():
  print("21 is only half the truth!")


# measure a function without input
result = timeit(test_function, number=100)
print("Execution Time:", result)

We pass the test function and a number of iterations to be performed to the timeit() method. The return value corresponds to the execution duration for the number of specified iterations of the function. This means in our example the duration of 100 executions of the test function will be obtained. If you are interested in the true average execution time, it can be calculated by dividing the total duration by the number of executions. As we are going to compare different methods this step is not needed here. “Why execute a function multiple times?” is a valid question that arises. The answer is to account for any fluctuations in system performance, it is advisable to test a function multiple times to obtain a more reliable representation of its typical performance. This is due to the fact that a single execution may not accurately represent the function’s performance because of factors such as CPU utilization, memory usage, or other processes running on the system. We will also talk about fluctuations at a later point in this article, so keep that in mind.

In addition to measuring the performance of functions that take no inputs, we can also use the timeit() method to measure the performance of functions that require input parameters. In this example, we will measure the execution time of a function that takes a string input and prints it in all uppercase.

from timeit import timeit

def print_all_caps(my_input_string):
  print(my_input_string.upper())


# measure a function with input
my_string = "hello fellow"

result = timeit(lambda: print_all_caps(my_string), number=100)
print("Execution Time:", result)

To measure the execution time of the print_all_caps() function with the input my_string, we use the timeit() method in connection with a lambda function. The lambda function takes no arguments and simply calls the print_all_caps() function with the input string. We pass the lambda function and a number of iterations to the timeit() method, which again returns the execution duration for the specified number of iterations. This allows us to compare the performance of the function under different conditions and gain insights into how it may perform with different inputs. In this particular case, we may want to check how the performance of the function differs when the input string is 2, 3, or even 1000 times as long as the original string. We put this into practice by using different list sizes when squaring the numbers contained in these lists to obtain more information about the performance measures.

Prepare the Functions and the Data to Be Measured

First, we need to define a list of numbers to get squared outside of the functions. We do not want to add the initialization time to the performance measurement. As just metnioned, we should not limit ourselves to just one list length when testing the performance. Instead, we will loop through multiple list lengths to see if there are any threshold points where a particular method becomes more efficient. This allows us to make more informed comparisons between the different methods.

import numpy as np

list_lengths = [10, 30, 50, 70, 
                100, 300, 500, 700, 
                1000, 3000, 5000, 7000, 
                10000, 30000, 50000, 70000,
                100000, 300000, 500000, 700000,
                1000000, 3000000, 5000000, 7000000
                10000000, 30000000, 50000000, 70000000]

for list_length in list_lengths:
    NUMBERS = np.random.rand(list_length)
 
    # TEST THE PERFORMANCE WITH DIFFERENT LIST_LENGTHS

An alternative way to create the same list as list_lengths with more flexibility using list comprehension:

starting_values = [10, 30, 50, 70]
list_lengths = []

# create a list and extend it with multiples predefined starting values. 
# the multiples are powers of 10.
for exp in range(8):
    temp_list = [x * 10**exp for x in starting_values]
    list_lengths.extend(temp_list)


for list_length in list_lengths:
    NUMBERS = np.random.rand(list_length)
 
    # TEST THE PERFORMANCE WITH DIFFERENT LIST_LENGTHS

Next, we can define the functions that we will use to compare performance:

def square_list_for_loop(numbers):
    numbers_squared = []
    for x in numbers:
        numbers_squared.append(x**2)
    return numbers_squared


def square_list_comprehension(numbers):
    numbers_squared = [x**2 for x in numbers]
    return numbers_squared


def square_list_map(numbers):
    numbers_squared = list(map(lambda x: x**2, numbers))
    return numbers_squared


def square_list_numpy(numbers):
    numbers_squared = np.power(numbers, 2)
    return numbers_squared


@numba.njit
def square_list_numba(numbers):
    numbers_squared = np.copy(numbers)
    for i in range(numbers_squared.size):
        numbers_squared[i] = numbers_squared[i] ** 2
    return numbers_squared

Since we have defined our functions for measuring performance, let’s encapsulate them in a list for easier management and iteration:

methods = [("For Loop", square_list_for_loop),
           ("List Comprehension", square_list_comprehension),
           ("Map Function", square_list_map),
           ("Numpy", square_list_numpy),
           ("Numba", square_list_numba)]

The methods list contains tuples, where each tuple represents a method that we want to compare. The first element of each tuple contains the name of the method as a string, and the second element is the actual function object that implements the method. This allows us to iterate over the list of methods and call each function by its name in a loop.

Then, we initialize a dictionary with empty lists as values using dictionary comprehension. The keys will be the method names and the lists will be containing execution times for each list length.


execution_times_dict = {method_tuple[0]: [] for method_tuple in methods}

Now that we have gathered all the necessary components, we can put everything together and measure the performance of each method on different list lengths.

Our steps so far:

(1) Define functions using various methods to square the elements of a list
(2) Create a list of list lengths
(3) Generate a methods list that contains tuples of method names and corresponding function objects
(4) Initialize a dictionary with empty lists as values to store the execution times for each method and list length

import numba
import numpy as np
from timeit import timeit

# (1)
def square_list_for_loop(numbers):
    numbers_squared = []
    for x in numbers:
        numbers_squared.append(x**2)
    return numbers_squared


def square_list_comprehension(numbers):
    numbers_squared = [x**2 for x in numbers]
    return numbers_squared


def square_list_map(numbers):
    numbers_squared = list(map(lambda x: x**2, numbers))
    return numbers_squared


def square_list_numpy(numbers):
    numbers_squared = np.power(numbers, 2)
    return numbers_squared


@numba.njit
def square_list_numba(numbers):
    numbers_squared = np.copy(numbers)
    for i in range(numbers_squared.size):
        numbers_squared[i] = numbers_squared[i] ** 2
    return numbers_squared



# (2)
list_lengths = [10, 30, 50, 70, 
                100, 300, 500, 700, 
                1000, 3000, 5000, 7000, 
                10000, 30000, 50000, 70000,
                100000, 300000, 500000, 700000,
                1000000, 3000000, 5000000, 7000000
                10000000, 30000000, 50000000, 70000000]

# (3)
methods = [("For Loop", square_list_for_loop),
           ("List Comprehension", square_list_comprehension),
           ("Map Function", square_list_map),
           ("Numpy", square_list_numpy),
           ("Numba", square_list_numba)] 

# (4)
execution_times_dict = {method_tuple[0]: [] for method_tuple in methods}

Finally we can measure the performance of the functions:

(5) Use nested for loops to iterate through the list of list lengths and methods
(6) For each iteration, generate a list of random numbers of the given length
(7) Measure the execution time of the corresponding function using the timeit library
(8) Store the execution times in the dictionary

# (5)
for list_length in list_lengths:
    NUMBERS = np.random.rand(list_length) # (6)

    for method_name, method in methods:
        execution_time = timeit(lambda: method(NUMBERS), number=100) # (7)        
        execution_times_dict[method_name].append(execution_time) #(8)

The results of running the code will save the time it takes to execute each method. From these results, we can see which method is the fastest and which one provides the best performance when squaring the elements of a list in Python. Let us now examine the results.

Results of the Performance Measurement

The results show that Numba outperforms all other methods by a notable advantage, followed by Numpy. List Comprehension, the Map Function and the For Loop come next in given order, with the For Loop and the Map Function performing similarly.

Performance Ranking

Numba
Numpy
List Comprehension
Map Function
For Loop

After determing the function with the worst performance, let’s take a look at how much faster the other functions are for different list lengths in comparison to the For Loop. For this we look at the following table, which shows the length of the respective lists, the execution time of the For Loop, and the corresponding speedup factors of the other functions.

In conclusion we can say that the Map Function doesn’t bring you a notable improvement, while List Comprehension is about 1.18 times faster and Numpy is about 8.43 times faster than the For Loop when squaring a list of numbers. Numba gives us the best improvement by far but it is also strongly fluctuating. We can also see this if we compare the mean values and standard deviations of the speedup factors.

List Comprehension, the Map Function and Numpy have a relatively small standard deviation, so the speedup factor will most probably always be close to the mean. On the other hand, the performance of Numba seems to be highly dependent on the factors I mentioned previously like CPU utilization, memory usage, or other processes running on the system as can be seen from the high standard deviation. However, when these factors are favorable, Numba can offer the best performance boost among the compared methods. It is important to keep track of the overall view and to keep in mind the context in which the developed functions are used.

Important Side Notes

The Map Function is often praised as very performant. This is the case as long as only the map function is used. In our case, however, the map function returns a map object that does not contain the squared values and thus cannot be directly used. Only if the map object is put into the list() function, the actual calculation of the function assigned in the map object is also carried out.
I manually removed the first measurement of the performance data, because the first execution of a Numba function takes longer than the following executions, as the function has to be compiled at first usage.

Conclusion

After conducting our performance tests on different Python optimization options for squaring a list of numbers, we discovered that Numba offers the fastest solution. However, it’s worth noting that implementing Numba requires some additional knowledge and expertise, and it may not be the best option for all problems. For instance, if your code only needs to execute the function once, it may not be worth using Numba due to the overhead of initial compilation. Numpy is not as fast as Numba, but it still outperforms the built-in methods For Loop, the Map Function, and List Comprehension. If you prefer using built-in methods only, then List Comprehension would be the best option, but keep in mind that Simple is better than complex and Readabilty counts.

If you’d like to learn more about how the plots and tables were created, don’t hesitate to reach out to me. I hope this article has provided you with valuable insights into performance measurements and helps you make informed decisions about which method to use for your specific use case. Thanks for reading and feel free to drop a comment!