Table of Content
Dealing with multidimensional array with Numpy is quite intuitive.
While doing an element-wise multipilication computation with Numpy is basically more easier compared to using a classical programming with a for
loop, understanding the internal operation of Numpy is useful to improve the computation efficiency when dealing with arrays consists of millions of elements.
In this post, we will look at the differences between flatten and ravel functions.
Both functions provide the same one-dimensional output by stacking up a multidimensional inputs.
The key difference is how the memory is copied during the process.
Let's say we would like to flatten an 1000*1000 dimensional array, using flatten
will returns a copy, whereas using ravel
will returns a view.
The computation time of both functions is shown as follows:
import numpy as np
# create a 1000*1000 dimensional array
arr = np.random.rand(1000, 1000)
print(f'Size of arr: {arr.shape}')
Size of arr: (1000, 1000)
%%time
arr_flatten = arr.flatten()
Wall time: 3.95 ms
%%time
arr_ravel = arr.ravel()
Wall time: 0 ns
Obviously, ravel
is much more faster than flatten
. Such a performance speedup can be significant when leading with very large arrays.
We can also check that both ravel
and flatten
functions returns the same output.
print(np.array_equal(arr_flatten, arr_ravel))
True
The difference is that there is not copy operation with ravel
. For flatten
, the output is a copy of the original array; wheareas for ravel
, the output is just a view of original array, in which whatever the changes in the second array will affect the change in the original array.
To understand the difference between a copy and a view, consider the memory block of these three arrays.
print(f'Memory address to store arr: {arr.__array_interface__["data"][0]}')
print(f'Memory address to store arr_flatten: {arr_flatten.__array_interface__["data"][0]}')
print(f'Memory address to store arr_ravel: {arr_ravel.__array_interface__["data"][0]}')
Memory address to store arr: 2394595090496
Memory address to store arr_flatten: 2394603155520
Memory address to store arr_ravel: 2394595090496
We can see that flatten
uses copy the array to a new memory block; whreas ravel
simply creates a view to the original array.
When array is not in C-order⚓︎
Note that ravel
will also do a copy operation when dealing with array that is not in the C-order.
For example, when we consider the array in Fortrain-order, as in a.T
, ravel
actually returns a flattened version with C-order.
%%time
arr_ravel2 = arr.ravel()
Wall time: 0 ns
%%time
arr_ravel2T = arr.T.ravel()
Wall time: 7.07 ms
When dealing with the array in different order, we can specify the order with ravel
isntead.
%%time
arr_ravel3T = arr.ravel(order = 'F')
Wall time: 6.03 ms
By specifying the order directly within the ravel
function, the computation time is slightly faster than doing ravel
directly on arr.T
.