Flatten vs Ravel functions in Numpy

Date: 2021-04-18 23:48:11, by Pc Ng

Table of Content

Dealing with multidimensional array with Numpy is quite intuitive. While doing an element-wise multipilication computation with Numpy is basically more easier compared to using a classical programming with a for loop, understanding the internal operation of Numpy is useful to improve the computation efficiency when dealing with arrays consists of millions of elements.

In this post, we will look at the differences between flatten and ravel functions. Both functions provide the same one-dimensional output by stacking up a multidimensional inputs. The key difference is how the memory is copied during the process. Let's say we would like to flatten an 1000*1000 dimensional array, using flatten will returns a copy, whereas using ravel will returns a view. The computation time of both functions is shown as follows:

import numpy as np 

# create a 1000*1000 dimensional array
arr = np.random.rand(1000, 1000)
print(f'Size of arr: {arr.shape}')

Size of arr: (1000, 1000)

%%time
arr_flatten = arr.flatten()

Wall time: 3.95 ms

%%time
arr_ravel = arr.ravel()

Wall time: 0 ns

Obviously, ravel is much more faster than flatten. Such a performance speedup can be significant when leading with very large arrays. We can also check that both ravel and flatten functions returns the same output.

print(np.array_equal(arr_flatten, arr_ravel))

True

The difference is that there is not copy operation with ravel. For flatten, the output is a copy of the original array; wheareas for ravel, the output is just a view of original array, in which whatever the changes in the second array will affect the change in the original array. To understand the difference between a copy and a view, consider the memory block of these three arrays.

print(f'Memory address to store arr: {arr.__array_interface__["data"][0]}')
print(f'Memory address to store arr_flatten: {arr_flatten.__array_interface__["data"][0]}')
print(f'Memory address to store arr_ravel: {arr_ravel.__array_interface__["data"][0]}')

Memory address to store arr: 2394595090496
Memory address to store arr_flatten: 2394603155520
Memory address to store arr_ravel: 2394595090496

We can see that flatten uses copy the array to a new memory block; whreas ravel simply creates a view to the original array.

When array is not in C-order⚓︎

Note that ravel will also do a copy operation when dealing with array that is not in the C-order. For example, when we consider the array in Fortrain-order, as in a.T, ravel actually returns a flattened version with C-order.

%%time
arr_ravel2 = arr.ravel()

Wall time: 0 ns

%%time
arr_ravel2T = arr.T.ravel()

Wall time: 7.07 ms

When dealing with the array in different order, we can specify the order with ravel isntead.

%%time
arr_ravel3T = arr.ravel(order = 'F')

Wall time: 6.03 ms

By specifying the order directly within the ravel function, the computation time is slightly faster than doing ravel directly on arr.T.