Data visualization is an integral part of data analysis and machine learning. Being able to create meaningful plots and charts from data allows analysts to easily interpret trends, patterns, and relationships in the data. Python has emerged as one of the most popular programming languages for data analysis due to its extensive ecosystem of data science libraries.
When it comes to data visualization and plotting in Python, Matplotlib is undoubtedly the most widely used library. Created by John Hunter in the early 2000s, Matplotlib provides a MATLAB-style plotting framework that enables users to generate publication-quality figures and plots with just a few lines of code. However, over the past decade, several new specialized plotting libraries have been developed as alternatives to Matplotlib, each with its own strengths and weaknesses.
In this comprehensive guide, we will contrast Matplotlib with three of the most popular alternative Python plotting libraries - Seaborn, Plotly, and Bokeh. We will examine the key differences between Matplotlib and these libraries in terms of usage, syntax, features, performance, and use cases. By the end of this guide, you will have a clear understanding of the capabilities of each library and when you may want to use one over the others for your data visualization needs.
Matplotlib Overview
Matplotlib is the grandfather of Python plotting libraries. It provides a comprehensive API for generating a wide variety of 2D plots, charts, and graphs that can be tweaked endlessly to customize the visual output.
Some of the major features of Matplotlib include:
-
Support for a wide range of plot types - line plots, scatter plots, bar charts, histograms, box plots, contour plots, heatmaps, polar plots, etc.
-
Highly customizable plots with control over colors, linestyles, legends, limits, ticks, labels, etc.
-
Object-oriented API and pyplot interface for procedural plotting.
-
Support for NumPy arrays as input data.
-
Extensive styling options through themes, style sheets, and rcParams.
-
High-quality output figures suitable for publication.
-
Export plots to various file formats - PNG, PDF, SVG, etc.
-
Embed plots into GUI frameworks like Tkinter, Qt, wxPython.
-
Animation and interactive plotting capabilities.
-
Broad functionality through add-on toolkits like matplotlib.pyplot, axes3d, etc.
Here is a simple example of creating a line plot with Matplotlib:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 10, 0.1)
y = np.sin(x)
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title('Simple Line Plot')
plt.grid()
plt.show()
This generates a nice looking sine wave plot with just a few lines of code!
Seaborn
Seaborn is a statistical data visualization library built on top of Matplotlib. Created by Michael Waskom in 2012, Seaborn provides a high-level API for creating attractive statistical graphics with Python. Some major features of Seaborn include:
-
Specialized plot types for statistical data - distplots, jointplots, pairplots, catplots, boxplots, violinplots, etc.
-
Visual theme options like darkgrid, whitegrid, dark, white, and ticks.
-
Tools to visualize univariate, bivariate, and multivariate data relationships.
-
Tight integration with pandas DataFrames for faster plotting.
-
Color palette options like color_palette() and mpl_palette.
-
Options to control figure aesthetics like despine(), set_style().
-
Utilities to visualize linear regression models.
Here is an example of a distplot created with Seaborn, showcasing its styling:
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
sns.distplot(tips['total_bill'], kde=False, bins=20)
plt.xlabel('Total Bill')
plt.ylabel('Frequency')
plt.title('Distribution of Total Bill')
plt.show()
Update (05/17/22): The distplot
function has been deprecated in newer versions of seaborn and it recommends using either the displot
or histplot
function instead. Both have similar functionality, but there are slight differences between them. See here for more details.
The displot
is a figure-level function and supersedes distplot
, providing access to several different approaches for visualizing the univariate or bivariate distribution of data. histplot
is an axes-level function used for plotting histograms, which simplifies the work of customizing the plot as per your needs.
Let’s update our Python code using the histplot
function:
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
sns.histplot(tips['total_bill'], kde=False, bins=20)
plt.xlabel('Total Bill')
plt.ylabel('Frequency')
plt.title('Distribution of Total Bill')
plt.show()
The Seaborn plotting API is designed to work intuitively with pandas and NumPy data structures like DataFrames and arrays. It allows users to quickly visualize statistical relationships in data.
Plotly
Plotly is an interactive, browser-based charting library for Python. Some of its major features include:
-
Support for a wide range of interactive and customizable chart types.
-
Options for charts like line, scatter, bar, pie, sunburst, heatmap, histogram,etc.
-
Interactive tools like zoom, pan, hover tooltips.
-
Linked brushing and view synchronization between plots.
-
3D and geo charts like surface, mesh, mapbox plots.
-
Render charts in Jupyter notebooks or export as HTML/static images.
-
Broad language support including Python, R, MATLAB, and Argi.
-
Graph objects with attributes like data, layout, frames for manipulating plots.
-
Large dataset handling and web deployment capabilities.
-
Options to add shapes, images, animations, and more to plots.
Here is an example of an interactive scatter plot matrix created with Plotly Express:
import plotly.express as px
df = px.data.iris()
fig = px.scatter_matrix(df, dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"], color="species")
fig.show()
This generates a matrix of scatter plots visualizing the multivariate iris flower dataset.
Plotly’s interactive charts allow deeper exploration of data relationships.
Bokeh
Bokeh is a Python library for creating interactive data visualizations and dashboards in web browsers. Its key features include:
-
Versatile plotting interface for building interactive plots, maps, time series, and widgets.
-
Support for large datasets and streaming data.
-
Flexible glyph system for custom graphics.
-
Tools for adding hover, selection, or zoom interactions.
-
Linked panning and brushing across plots.
-
Customizable toolbar and widgets for filtering, selection.
-
Options for adding animations to plots.
-
Export plots as static images or interactive web apps.
-
Bind plots to real-time data sources.
-
Integrate and embed plots into Flask, Django web apps.
Here is a simple example of an interactive sine wave plot created with Bokeh:
from bokeh.plotting import figure, output_file, show
from bokeh.layouts import column
from math import sin, pi
import numpy as np
output_file("sine.html")
x = np.arange(-2*pi, 2*pi, 0.1)
y = [sin(i) for i in x]
p = figure(title="Sine Wave Example")
p.line(x, y)
show(column(p))
This generates an interactive plot that can be panned, zoomed, and saved.
Bokeh allows building rich interactive data apps and dashboards for the web.
Key Differences
Now that we have looked at some examples of using Matplotlib, Seaborn, Plotly, and Bokeh, let us examine some of the key differences between these Python data visualization libraries:
1. Syntax and Ease of Use
-
Matplotlib has a verbose, MATLAB-style syntax which can have a steep learning curve. Plotting requires explicitly creating figure and axis objects.
-
Seaborn and Plotly use a simpler, more intuitive syntax. Seaborn has a pandas-like API while Plotly uses a declarative grammar.
-
Bokeh has a straightfoward Pythonic API but requires understanding its own glyphs and objects.
2. Level of Control
-
Matplotlib offers the most flexibility and customizability for all aspects of a plot.
-
Seaborn reduces flexibility in favor of better default aesthetics. Limited control beyond styles.
-
Plotly and Bokeh have customizable options but are more constrained compared to Matplotlib.
3. Plot Customization
-
Matplotlib exposes properties of all plot elements allowing meticulous customization via rcParams, stylesheets, etc.
-
Seaborn, Plotly and Bokeh have preset themes and limited styling options. More constrained aesthetics.
4. Data Structures
-
Matplotlib can handle NumPy arrays but needs additional handling for DataFrames.
-
Seaborn is designed for pandas DataFrames and Series making for faster plotting.
-
Plotly and Bokeh integrate nicely with both NumPy and Pandas data structures.
5. Visual Aesthetics
-
Seaborn has the best default visual styles and color palettes for statistical plots.
-
Matplotlib’s default plots are more basic but extremely customizable through configurations.
-
Plotly and Bokeh have good visuals but limited styling compared to Seaborn and Matplotlib.
6. Plot Types and Functionality
-
Matplotlib supports the widest range of 2D and 3D plot types. Extensive functionality through modules.
-
Seaborn specializes in statistical plots like heatmaps, clusters, timeseries, distributions, etc.
-
Plotly has a wide selection of 2D and 3D charts including statistical, scientific, geographic, and financial plots.
-
Bokeh excels at interactive plotting and dashboards with selections, hovers, and linked panning.
7. Interactivity
-
Matplotlib has basic built-in interactivity. Limited tools compared to other libraries.
-
Seaborn does not have interactive capabilities, produces static plots.
-
Plotly and Bokeh are designed for interactivity like zooming, panning, hover tooltips, etc.
8. Large Datasets and Performance
-
Seaborn and Matplotlib are not optimized for large datasets. Performance degrades with large data.
-
Plotly and Bokeh work well with large datasets through WebGL or server-side rendering and optimization.
-
Bokeh streams data efficiently for real-time dashboarding.
9. Environment and Sharing
-
Matplotlib and Seaborn generate only static images making sharing difficult. Need to save files.
-
Plotly and Bokeh generate interactive HTML/JavaScript charts that can be shared online or embedded into sites and apps.
-
Bokeh and Plotly enable creating web-based dashboards and apps for broader usage.
When to Use Each Library
Based on their various capabilities, here are some recommendations on when you may want to use Matplotlib, Seaborn, Plotly or Bokeh for your visualization needs:
-
Use Matplotlib when you need full control over customizing plot appearance and behavior for publication quality figures. The functionality and flexibility of Matplotlib makes it great for scientific computing and visualization.
-
Use Seaborn when you want to quickly explore statistical relationships in data and create attractive statistical graphics and plots with a high-level API. Great for statistical analysis.
-
Use Plotly when you need to create interactive plots, especially when working with multivariate or geospatial data. Useful for exploring large datasets.
-
Use Bokeh when you want to build interactive dashboards and data apps for sharing online. Great for streaming data visualization.
Of course, many times you can use these libraries together to take advantage of their complementary strengths. For example, you may use Matplotlib for initial analysis and prototyping, then switch to Bokeh or Plotly for building interactive web-based visualizations. Or use Seaborn on top of Matplotlib to improve the styling of statistical plots.
Knowing the key features and differences between these libraries will allow you to pick the right tool for your data visualization needs. The Python data science ecosystem provides a wealth of options for both exploratory analysis and production-quality graphics.
Example Usage Scenarios
To further illustrate when you may want to use each library, let’s look at some real-world examples and usage scenarios:
Matplotlib Usage
Desmond is a research scientist studying wind turbine data to build predictive maintenance models. For his publications, he needs to create high-quality line and scatter plots showing turbine sensor metrics over time. Matplotlib is the best choice here as Desmond can fully customize colors, labels, legend, and styling to prepare publication-ready figures.
Seaborn Usage
Gladilyn is a data analyst exploring the statistical relationship between different attributes in a housing dataset to determine pricing trends. She wants to quickly generate some attractive histograms, heatmaps, and regression plots to understand the distributions and correlations. Seaborn allows Gladilyn to easily create these statistical visualizations with good defaults.
Plotly Usage
Junell is a data scientist who needs to analyze results from an A/B test conducted on a website. He has to present his findings to the product team in an interactive demo. Plotly helps Junell create linked bar charts showing conversion rates, funnel charts showing customer drop off rates, and other web analytics plots to showcase in an interactive dashboard.
Bokeh Usage
Argi is developing a real-time data monitoring system for an IoT fleet management application. She needs to build a dynamic dashboard that shows streaming telemetry data like vehicle sensors, geolocation, diagnostics, etc. Bokeh is the perfect fit allowing Argi to stream data to interactive plots and maps updated in real-time.
As you can see, each library is better suited for certain use cases based on their capabilities and the context of the problem at hand.
Conclusion
In this comprehensive guide, we explored Matplotlib and contrasted it with alternative Python plotting libraries - Seaborn, Plotly, and Bokeh. We looked at the key features of each library along with example usage. We also examined differences in syntax, functionality, customization options, interactivity, and performance. Finally, we covered real-world scenarios to help illustrate when you may want to select a particular visualization library.
Matplotlib remains an extremely versatile plotting package for Python and provides a solid foundation for other libraries like Seaborn to build upon. However, for statistical plotting, interactive visualization, and building data apps and dashboards, Seaborn, Plotly and Bokeh are compelling choices with their own strengths. As a data scientist, knowing how and when to use each of these Python data visualization tools will enable you to create meaningful graphics and derive the most value out of your data.