Histogram allows us to visualize the frequency distribution of our data.
It breaks the data into a few smaller bins according to the value of the data, and then count the number of occurences (i.e., the frequency) in each bin.
We can obtain the frequency and bins for a given data using the histogram() function from numpy. Let's consider the following example:
importnumpyasnp# generate 1000 random numbersx=np.random.rand(1000,1)# count the occurences in each bin in xfrequency,bins=np.histogram(x,bins=10,range=[0,1])forb,finzip(bins[1:],frequency):print(f'value: {(round(b,1))} >> frequency: {f}')
Here, we used numpy.random.rand() function to generate 1000 uniformly distributed values, ranging from 0 to 1. An array x is defined to store the generated values.
We would like to know how many data is within 0-0.1, how many occurs at 0.1-0.2, and so on.
These can be obtained by calling numpy.histogram() function.
Matplotlib allows us to plot the histogram with pyplot.hist() function.
Let's continue with the above example, and use the histogram function in Matplotlib to visualize the data distribution.
Let's generate another set of random number, but with normal distribution.
Instead of numpy.random.rand(), we can use numpy.random.randn() to generate a series of values that follow standard normal distribution with zero mean and standard deviation equals to 1.
Notes:
if we would like to have a normal distribution with specific mean and standard deviation, we can use the following formula:
$$
\sigma * numpy.random.randn() + \mu
$$
# generate 1000 random numbersx=np.random.randn(1000,1)plt.hist(x,bins=10)plt.title("Data Distribution")plt.xlabel("Value")plt.ylabel("Frequency")plt.show()
We can plot multiple histograms for easy comparison.
Let's create 3 numpy arrays each consists of 1000 normally distributed random numbers based on different mean and standard deviation.
We can create spacing between each bin in the histogram using the set_style() function from seaborn.
Note that seaborn is built upon matplotlib, so we can use seaborn and matplotlib together.