# Uncomment following line to install on colab #! pip install qeds
One of the most important outputs of your analysis will be the visualizations that you choose to communicate what you’ve discovered.
Here are what some people – whom we think have earned the right to an opinion on this material – have said with respect to data visualizations.
Above all else, show the data – Edward Tufte
By visualizing information, we turn it into a landscape that you can explore with your eyes. A sort of information map. And when you’re lost in information, an information map is kind of useful – David McCandless
I spend hours thinking about how to get the story across in my visualizations. I don’t mind taking that long because it’s that five minutes of presenting it or someone getting it that can make or break a deal – Goldman Sachs executive
We won’t have time to cover “how to make a compelling data visualization” in this lecture.
Instead, we will focus on the basics of creating visualizations in Python.
This will be a fast introduction, but this material appears in almost every lecture going forward, which will help the concepts sink in.
In almost any profession that you pursue, much of what you do involves communicating ideas to others.
Data visualization can help you communicate these ideas effectively, and we encourage you to learn more about what makes a useful visualization.
We include some references that we have found useful below.
- The Functional Art: An introduction to information graphics and visualization by Alberto Cairo
- The Visual Display of Quantitative Information by Edward Tufte
- The Wall Street Journal Guide to Information Graphics: The Dos and Don’ts of Presenting Data, Facts, and Figures by Dona M Wong
- Introduction to Data Visualization
The most widely used plotting package in Python is matplotlib.
The standard import alias is
import matplotlib.pyplot as plt import numpy as np
Note above that we are using
matplotlib.pyplot rather than just
pyplot is a sub-module found in some large packages to further organize functions and types. We are able to give the
plt alias to this sub-module.
Additionally, when we are working in the notebook, we need tell matplotlib to display our images inside of the notebook itself instead of creating new windows with the image.
This is done by
%matplotlib inline # activate plot theme import qeds qeds.themes.mpl_style();
Let’s create our first plot!
After creating it, we will walk through the steps one-by-one to understand what they do.
# Step 1 fig, ax = plt.subplots() # Step 2 x = np.linspace(0, 2*np.pi, 100) y = np.sin(x) # Step 3 ax.plot(x, y)
[<matplotlib.lines.Line2D at 0x7f34aea85e10>]
- Create a figure and axis object which stores the information from our graph.
- Generate data that we will plot.
- Use the
ydata, and make a line plot on our axis,
ax, by calling the
Difference between Figure and Axis¶
We’ve found that the easiest way for us to distinguish between the figure and axis objects is to think about them as a framed painting.
The axis is the canvas; it is where we “draw” our plots.
The figure is the entire framed painting (which inclues the axis itself!).
We can also see this by setting certain elements of the figure to different colors.
fig, ax = plt.subplots() fig.set_facecolor("red") ax.set_facecolor("blue")
This difference also means that you can place more than one axis on a figure.
# We specified the shape of the axes -- It means we will have two rows and three columns # of axes on our figure fig, axes = plt.subplots(2, 3) fig.set_facecolor("gray") # Can choose hex colors colors = ["#065535", "#89ecda", "#ffd1dc", "#ff0000", "#6897bb", "#9400d3"] # axes is a numpy array and we want to iterate over a flat version of it for (ax, c) in zip(axes.flat, colors): ax.set_facecolor(c) fig.tight_layout()
countries = ["CAN", "MEX", "USA"] populations = [36.7, 129.2, 325.700] land_area = [3.850, 0.761, 3.790] fig, ax = plt.subplots(2) ax.bar(countries, populations, align="center") ax.set_title("Populations (in millions)") ax.bar(countries, land_area, align="center") ax.set_title("Land area (in millions miles squared)") fig.tight_layout()
Scatter and annotation
N = 50 np.random.seed(42) x = np.random.rand(N) y = np.random.rand(N) colors = np.random.rand(N) area = np.pi * (15 * np.random.rand(N))**2 # 0 to 15 point radii fig, ax = plt.subplots() ax.scatter(x, y, s=area, c=colors, alpha=0.5) ax.annotate( "First point", xy=(x, y), xycoords="data", xytext=(25, -25), textcoords="offset points", arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=0.6") )
Text(25, -25, 'First point')
x = np.linspace(0, 1, 500) y = np.sin(4 * np.pi * x) * np.exp(-5 * x) fig, ax = plt.subplots() ax.grid(True) ax.fill(x, y)
[<matplotlib.patches.Polygon at 0x7f34ae3e4198>]