Skip to content

Introduction to Matplotlib for Data Visualization in Python

Updated: at 04:34 AM

Matplotlib is one of the most popular Python libraries used for data visualization and plotting 2D arrays. It provides a wide range of plotting functionality to create interactive visualizations and production-quality figures in Python. Matplotlib can generate histograms, scatter plots, bar charts, pie charts, error charts, area charts and many more with just a few lines of code.

This comprehensive guide will provide an introduction to Matplotlib by examining key features, installation, basic plots, customizing visuals, saving, showing and embedding plots. We will also explore some of the most common plot types and useful tips for effective data visualization. Real-world examples and sample codes are provided throughout to illustrate the concepts. By the end, you will have a solid foundation to start creating elegant, publication-ready visualizations using Matplotlib.

Table of Contents

Open Table of Contents

Installing Matplotlib

Matplotlib can be installed using pip which is the package manager for Python.

pip install matplotlib

To upgrade to the latest version:

pip install -U matplotlib

To install a specific version:

pip install matplotlib==3.5.1

After installing, we need to import Matplotlib’s pyplot module to start creating visualizations:

import matplotlib.pyplot as plt

Now let’s look at some basic plotting with Matplotlib.

Basic Matplotlib Plots

We will explore a few fundamental plot types like line plots, scatter plots, bar charts, histograms and pie charts.

Line Plot

Line plots can be used to visualize data over time or observe trends by connecting individual data points. To plot a line chart with Matplotlib, we need to provide the x and y values.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [1, 4, 9, 16, 25]

plt.plot(x, y)
plt.title('Line Plot')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

This generates a simple line chart with the data points connected:

Line Plot

We can plot multiple line plots on the same axes by adding more plt.plot() function calls with the corresponding data.

Scatter Plot

Scatter plots are used to visualize the relationship between two continuous variables. Each data point is plotted as a dot on the graph.

To create a scatter plot in Matplotlib, we pass the x and y arrays to plt.scatter().

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 5, 4, 7, 5]

plt.title('Scatter Plot')
plt.scatter(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.show()

This generates the following scatter plot:

Scatter Plot

We can observe if there are any patterns or correlations between the x and y data points through visual inspection.

Bar Chart

Bar charts are used to compare quantities corresponding to different categories. To plot bars with Matplotlib, we need to provide the height of each bar and their labels.

import matplotlib.pyplot as plt

labels = ['A', 'B', 'C', 'D']
values = [10, 30, 20, 25]

plt.title('Bar Chart')
plt.bar(labels, values)
plt.show()

This code produces a simple vertical bar chart:

Bar Chart

We can also generate horizontal bar charts using the plt.barh() method.

import matplotlib.pyplot as plt

labels = ['A', 'B', 'C', 'D']
values = [10, 30, 20, 25]

plt.figure(figsize=(10, 6))
plt.title('Horizontal Bar Chart')
plt.barh(labels, values)
plt.xlabel('Values')
plt.ylabel('Labels')
plt.show()

Horizontal Bar Chart

Histogram

Histograms display the frequency distribution of data using bars. To plot a histogram in Matplotlib, we pass the dataset into plt.hist().

import matplotlib.pyplot as plt
import numpy as np

x = np.random.normal(size=1000)

plt.hist(x, bins=25)
plt.title('Histogram')
plt.show()

This generates a histogram with 25 bins:

Histogram

We can adjust the number of bins and other parameters like color to customize the histogram.

Pie Chart

Pie charts visualize data as circular sectors representing the proportion of each quantity. We need to provide the wedge sizes and labels for each sector.

import matplotlib.pyplot as plt

values = [20, 40, 60, 80]
labels = ['A', 'B', 'C', 'D']
explode = [0, 0.2, 0, 0]

plt.pie(values, labels=labels, explode=explode)
plt.title('Pie Chart')
plt.show()

This code creates an exploded pie chart:

Pie Chart

The explode parameter separates out one wedge from the rest for highlight. There are many options available to customize pie charts in Matplotlib.

These examples demonstrate how easily we can generate common plots like line charts, scatter plots, bar charts, histograms and pie charts using Matplotlib’s pyplot module. Now let’s learn how to customize and stylize these plots.

Customizing Plots

Matplotlib provides various options to customize every element of a figure like colors, labels, limits, legend etc. This allows us to generate exactly the visualization we need.

Figure and Axes

In Matplotlib, the overall graphic is contained in a Figure object, and each plot (line, bar etc) is drawn on an Axes object.

We can create multiple plots per figure by adding more Axes objects using fig.add_axes():

import matplotlib.pyplot as plt

fig = plt.figure()

axes1 = fig.add_axes([0.1, 0.5, 0.8, 0.4]) # x, y, width, height
axes2 = fig.add_axes([0.1, 0.1, 0.8, 0.4])

# Plot on each axes
axes1.plot([1,2,3], [4,5,6])
axes2.scatter([1,3,5], [5,2,4])

This generates a figure with two Axes objects containing different plots:

Axes

We can customize the look and limits of each Axes as needed. The Figure provides overall styling like background color.

Labels and Titles

We can add descriptive labels and titles to provide context using:

plt.title('Sales Over Time')
plt.xlabel('Date')
plt.ylabel('Revenue')

Multi-line plots should include a legend to identify each line. We can add a legend using:

plt.plot(x1, y1, label='Line 1')
plt.plot(x2, y2, label='Line 2')
plt.legend()

This will generate a legend box identifying each line.

Limits and Scales

We may want to adjust the range of values visible on the plot or change the scale from linear to logarithmic.

Setting limits crops the plot:

plt.xlim([0, 10])
plt.ylim([0, 100])

Changing to a log scale:

plt.xscale('log')
plt.yscale('log')

This makes plots more readable for certain use cases like visualizing exponential data.

Legend

The plot legend identifies the meaning of different lines, bars or regions on the figure. We can position the legend using bbox coordinates:

plt.legend(loc=(1.0, 0.5))

Common locations are 'upper left', 'lower right', 'center left' etc. Omitting loc places it in the default best position.

We can also customize the legend appearance by specifying labels, fontsize, color etc.

Annotations

Annotations like arrows, text boxes and shapes can be added to highlight areas or communicate extra details on the plot.

Adding text at a specific data point:

plt.text(2.5, 60, 'Important Point', fontsize=12)

Annotating with an arrow:

plt.annotate('Minimum', xy=(x_min, y_min), xytext=(x_min+5, y_min+5), arrowprops=dict(facecolor='black'))

Drawing a shape rectangle:

plt.axvspan(xmin, xmax, ymin=0.25, ymax=0.75, alpha=0.5, color='red')

There are many more annotation functions available in Matplotlib.

Style and Color

We can customize the overall style of the plot using built-in styles like ‘ggplot’, ‘seaborn’, ‘fivethirtyeight’ etc.

plt.style.use('ggplot')

Colors can also be set for individual elements like:

plt.plot(x, y, color='green')
plt.bar(x, y, color='#FF5733')

Supported color formats are hexadecimal, RGB tuple, and HTML color names.

This covers the main options for customizing the different components of a Matplotlib figure. Next, we will look at saving plots to file.

Saving, Showing and Embedding Plots

Matplotlib provides a few ways to save plots to image files using:

plt.savefig('plot.png') # Supported formats: .pdf, .svg, .png, .jpg etc.

We can also show() a plot rendering it on the screen:

plt.show()

For Jupyter notebooks, we can display the plot embedded within the cell output:

from IPython.display import display

display(plt.gcf())

Some parameters to customize saving:

Now that we have covered the basics, let’s look at some other useful plot types provided by Matplotlib.

Other Types of Plots

Matplotlib supports a wide range of common and specialized plots including:

For example, creating subplots with shared x and y axes:

fig, axs = plt.subplots(2, 2, sharex=True, sharey=True)

axs[0,0].plot(x, y)
axs[0,1].scatter(x, y)
axs[1,0].bar(x, y)
axs[1,1].hist(x, bins=10)

Contour plot:

x = np.arange(-3, 3, 0.1)
y = np.arange(-3, 3, 0.1)
X, Y = np.meshgrid(x, y)
Z = np.sqrt(X**2 + Y**2)

cp = plt.contourf(X, Y, Z)

The variety of plot types along with customization options make Matplotlib suitable for any visualization need.

Tips for Effective Data Visualization

Here are some tips to create effective, easy-to-understand visualizations with Matplotlib:

Conclusion

In this comprehensive guide, we explored Matplotlib’s key features for data visualization in Python. We covered installation, basic plots like line, scatter, bar, histogram and pie charts. We also learned how to customize every element of the figure and save plots to file. Finally, we looked at tips to create effective, insightful visualizations with Matplotlib.

Matplotlib provides a mature, flexible library to generate publication-quality figures and interactive visuals to gain valuable insights from data. With the basics learned here, you will be ready to start leveraging Matplotlib’s extensive capabilities for your own data science and visualization needs in Python.