Effective data visualization is key to extracting insights from data. Matplotlib is one of the most popular Python libraries used for data visualization and plotting in Python. With Matplotlib, you can create a wide variety of graphs, charts, histograms, heatmaps, and more. However, creating visually appealing plots requires following principles of visual design, color theory, and perceptual science. This guide will provide best practices and techniques for creating beautiful and engaging Matplotlib visualizations in Python.
Table of Contents
Open Table of Contents
Introduction
Matplotlib is a comprehensive 2D plotting library that produces publication-quality figures in Python. It provides an object-oriented API that allows you to build plots piece by piece programmatically. Matplotlib can generate simple plots with just a few commands as well as complex multi-plot grids with complete control over each axis.
To make the most out of Matplotlib, it is important to learn design principles that make visualizations easier to understand. Well-designed plots effectively convey key information, patterns, or insights from the data. Poorly designed visuals can mislead, hide vital data points, or simply bore the audience. By following best practices and techniques outlined in this guide, you can create beautiful, perceptually effective, and engaging Matplotlib visualizations.
Principles of Effective Visualization
Before diving into Matplotlib specifics, let’s review some key principles from visual design, color theory, and perception that form the foundation of good visualization:
Focus on Relevance
Only include plot elements that highlight something meaningful in the data. Decorative elements or unnecessary details distract from the core story and insights.
Eliminate Chartjunk
Avoid visual clutter like excessive grid lines, borders, legends, or redundant labels that don’t add value. Decluttered plots are easier to interpret.
Choose Appropriate Visual Encodings
Pick visual encodings like position, size, shape, and color that match the type of data being displayed. This makes patterns readily perceptible.
Facilitate Comparisons
Use consistent scales, common baselines, and alignment to make it easy to compare plotted quantities.
Use Judicious Contrast
Leverage contrasting colors, sizes, positions, etc. to distinctly highlight key elements, differences, or trends.
Direct Attention
Draw focus to important plot areas with techniques like reduced opacity in unimportant regions.
Balance Aesthetics and Clarity
Make sensible aesthetic choices that enhance clarity rather than detract from it. Customize Matplotlib’s default parameters only when suitable.
Choose Color Palettes Wisely
Use appropriate categorical, sequential, or diverging color schemes based on data characteristics and visualization goals.
Accommodate Color Vision Deficiency
Around 4% of the population has color vision deficiencies. Ensure visuals remain interpretable when viewed in grayscale.
By keeping these core principles in mind when working with Matplotlib, you can make informative plots that engage audiences. Now let’s look at how to put some of these ideas into practice.
Matplotlib Plot Customization
Matplotlib provides extensive control over all aspects of a plot through its configurations and parameters. Customizing the default Matplotlib styles and components appropriately can greatly improve clarity and aesthetics.
Set Figure and Axes Parameters
The Figure and Axes objects control overall plot area and coordinate space respectively. We can set properties like background color, size, resolution and layout:
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(8, 5), dpi=120, facecolor='lightgrey')
ax = fig.add_axes([0.15, 0.15, 0.8, 0.8],
facecolor='#eafff5',
xlim=(0, 100),
ylim=(-5, 105))
figsize
sets width and height in inchesdpi
controls dots-per-inch resolutionfacecolor
sets background coloradd_axes
positions and sizes the axes rectangle inside the figurexlim
andylim
define x and y axis limits
Customize Axis Ticks and Labels
Ticks and labels should be legible without cluttering the plot. We can configure them as follows:
import numpy as np
ax.tick_params(axis='both', direction='in', length=6, width=1.5,
labelsize=14, pad=8)
ax.set_xticks(np.arange(0, 110, 10))
ax.set_yticks(np.arange(-5, 110, 10))
ax.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
rotation=30,
horizontalalignment='right')
tick_params
sets tick style and label font sizeset_xticks
andset_yticks
set custom tick positionsset_xticklabels
changes x-axis labels and rotates them
Add Informative Plot Titles and Axis Labels
Descriptive titles and labels provide context for the visualization:
ax.set_title('Average Monthly Temperature in Boston',
fontsize=18, pad=20)
ax.set_xlabel('Month', labelpad=20)
ax.set_ylabel('Temperature (°F)', labelpad=20)
set_title
,set_xlabel
andset_ylabel
set plot title, x-axis label and y-axis label respectivelyfontsize
andlabelpad
adjust text size and padding
Set Legend Parameters
Legends can be positioned outside plots and configured with:
legend = ax.legend(loc='upper center',
bbox_to_anchor=(0.5, 1.15),
ncol=3,
fontsize=12,
framealpha=1,
facecolor='w',
frameon=True)
loc
andbbox_to_anchor
position the legendncol
sets number of columnsfontsize
adjusts entry text sizeframealpha
sets frame transparencyfacecolor
sets legend background color
Save High-Resolution Figure
Use:
fig.savefig('plot.png', bbox_inches='tight', dpi=300)
to export the figure to a pixel-dense PNG for publications or presentations.
By tweaking these and other figure elements methodically, you can customize Matplotlib’s defaults into more effective visualizations.
Choosing Color Schemes
Color is a key visual encoding in plots for distinguishing groups, highlighting trends, or attracting attention. Matplotlib has several built-in color palettes and tools for choosing colors.
Categorical Color Maps
For qualitative data like categories, use categorical color schemes with distinct hues:
import matplotlib.pyplot as plt
categories = ['A', 'B', 'C', 'D']
colors = plt.get_cmap('Set1')(np.linspace(0, 1, len(categories)))
get_cmap
provides a perceptually uniform categorical mapSet1
has easily distinguished colors
Sequential Color Maps
Sequential schemes are useful for numeric data with a natural ordering like magnitudes. We can utilize the full palette or subplot:
values = [0.2, 0.5, 0.3, 0.8, 1.0]
viridis = plt.get_cmap('viridis')
colors = viridis(values)
# show part of colormap
viridis(np.linspace(0, 0.7, 8))
viridis
is a perceptually uniform sequential map- Brightness indicates magnitude
Diverging Color Maps
For data with a meaningful center point, like deviations, use diverging color schemes:
div_cmap = plt.get_cmap('RdBu', 11)
normalized_values = [-1.0, -0.6, -0.3, -0.1,
0.0,
0.1, 0.3, 0.6, 1.0]
colors = div_cmap(normalized_values)
RdBu
diverges from red to blue through neutral white- Data spread on both sides of center 0.0 value
Avoiding Poor Color Choices
Don’t use:
- Unnecessary colors that don’t encode data
- Bright, saturated colors for large backgrounds
- Red-green combinations for colorblind issues
- Rainbow color maps that have perceptual nonlinearities
Instead, pick color palettes consciously based on data characteristics and visual goals. Resources like ColorBrewer can help select suitable schemes.
Visualizing Different Data Types
Now let’s look at how to build various Matplotlib plot types for specific data characteristics using best practices covered so far:
Line Plots
Ideal for visualizing relationships and trends over a continuous variable. For example, stock prices over time:
import numpy as np
import matplotlib.pyplot as plt
x = np.arange('2020-01', '2020-12', dtype='datetime64[M]')
y = np.random.randn(len(x)).cumsum() + 15
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, linestyle='-', marker='o', linewidth=2, markersize=5)
ax.set(title="Stock Price Over Time",
xlabel="Date",
ylabel="Price ($)")
- Line shows overall trend
- Markers highlight individual data points
- Legend not required for single line
Bar Charts
Bars effectively compare categorical data. For instance, sales by product category:
import matplotlib.pyplot as plt
categories = ['A', 'B', 'C', 'D']
sales = [16000, 14000, 17500, 19500]
fig, ax = plt.subplots(figsize=(6, 5))
ax.bar(categories, sales, width=0.5, edgecolor="grey", linewidth=0.7)
ax.set(xlabel="Product Category",
ylabel="Quarterly Sales",
title="Sales by Product Category")
- Bars are positioned side-by-side for easy comparison
- Outlining bars in subtle gray helps distinguish them
Scatter Plots
Scatterplots reveal relationships and patterns between two numeric variables. We can visualize the correlation between quiz scores and exam grades as follows:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.normal(70, 15, 50)
y = np.random.normal(75, 20, 50)
fig, ax = plt.subplots(figsize=(6,6))
ax.scatter(x, y, s=75, c='steelblue',
edgecolor='white', linewidth=1.5)
ax.set(xlabel='Quiz Scores', ylabel='Exam Grades',
xlim=(0,100), ylim=(0,100))
- Larger blue dots clearly mark each data point
- White edges help separate overlapping points
- Correlation is evident from angled scatter pattern
Histograms
Histograms show frequency distributions of numeric data. We can plot an age distribution as:
import numpy as np
import matplotlib.pyplot as plt
ages = np.random.normal(45, 15, 500)
fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(ages, bins=20, edgecolor='black', linewidth=1.2)
ax.set(xlabel='Age', ylabel='Frequency',
title='Age Distribution')
- Bars visualize how values cluster
- Black outlines improve legibility
- Normal curve shape indicates source distribution
Heatmaps
Heatmaps convey magnitude variations through color intensity. We can plot employee performance on a 5-point scale as:
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randint(1, 6, size=(10, 5))
norm = plt.Normalize(1,5)
fig, ax = plt.subplots(figsize=(6, 5))
cmap = plt.get_cmap('Reds')
heat_map = ax.imshow(data, cmap=cmap, norm=norm)
cbar = fig.colorbar(heat_map)
ax.set_yticklabels(range(1,6))
ax.set_xticklabels(range(1,11))
imshow
plots matrix as image- Red shade darkness indicates rating
- Colorbar legend facilitates interpretation
By applying principles covered to various plot types needed for your data, you can build insightful and aesthetically-pleasing visualizations.
Real-World Example
Let’s take an example of visualizing hotel booking data to illustrate several best practices in action.
We will create a line chart comparing daily bookings for hotels A, B, and C over 2 months:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
np.random.seed(1)
hotels = ['A', 'B', 'C']
bookings = np.random.randint(20, 100, (60, 3)).cumsum(axis=0)
dates = np.arange('2020-01', '2020-03', dtype='datetime64[D]')
fig, ax = plt.subplots(figsize=(10, 5))
colors = plt.get_cmap('Dark2')(np.linspace(0, 1, 3))
for i in range(3):
ax.plot(dates, bookings[:,i], marker='o',
linestyle='-', linewidth=2,
color=colors[i], label=hotels[i])
ax.set(title="Daily Hotel Bookings",
xlabel="Date",
ylabel="Bookings")
ax.legend(loc="upper left", ncol=1, frameon=True,
facecolor='lightgrey', framealpha=0.7,
fontsize=12)
fig.savefig("hotel_bookings.png", dpi=300,
bbox_inches = 'tight')
This code applies several principles:
- Plots display only relevant data highlighting booking trends
- Sparse tick labels on x-axis minimize clutter
- Line styles clearly distinguish each hotel
- Distinct colors aid interpretation using categorical scheme
- Legend positioned out of the way
- Contrasting background and subtle frame improve legibility
- File exported at high resolution for sharp image
The resulting visualization summarizes daily bookings trends for the hotels efficiently and aesthetically.
Conclusion
Creating insightful and engaging data visualizations requires both Matplotlib coding skills as well as design expertise. By leveraging principles from visual perception, color theory, and design, you can build beautiful and effective Matplotlib plots that clearly convey key information.
The best practices outlined in this guide form a starting point for developing good visualization habits. As you gain more Matplotlib experience, also study real-world examples and learn from experts to further refine your approach. Aesthetics and functionality work together in great visualizations. Leveraging them thoughtfully will enable you to transform raw data into impactful data stories using Python’s powerful Matplotlib library.