Plotting

Prerequisites

Outcomes

  • Understand components of matplotlib plots

  • Make basic plots

# Uncomment following line to install on colab
#! pip install 

Visualization

One of the most important outputs of your analysis will be the visualizations that you choose to communicate what you’ve discovered.

Here are what some people – whom we think have earned the right to an opinion on this material – have said with respect to data visualizations.

I spend hours thinking about how to get the story across in my visualizations. I don’t mind taking that long because it’s that five minutes of presenting it or someone getting it that can make or break a deal – Goldman Sachs executive

We won’t have time to cover “how to make a compelling data visualization” in this lecture.

Instead, we will focus on the basics of creating visualizations in Python.

This will be a fast introduction, but this material appears in almost every lecture going forward, which will help the concepts sink in.

In almost any profession that you pursue, much of what you do involves communicating ideas to others.

Data visualization can help you communicate these ideas effectively, and we encourage you to learn more about what makes a useful visualization.

We include some references that we have found useful below.

matplotlib

The most widely used plotting package in Python is matplotlib.

The standard import alias is

import matplotlib.pyplot as plt
import numpy as np

Note above that we are using matplotlib.pyplot rather than just matplotlib.

pyplot is a sub-module found in some large packages to further organize functions and types. We are able to give the plt alias to this sub-module.

Additionally, when we are working in the notebook, we need tell matplotlib to display our images inside of the notebook itself instead of creating new windows with the image.

This is done by

%matplotlib inline

The commands with % before them are called Magics.

First Plot

Let’s create our first plot!

After creating it, we will walk through the steps one-by-one to understand what they do.

# Step 1
fig, ax = plt.subplots()

# Step 2
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)

# Step 3
ax.plot(x, y)
[<matplotlib.lines.Line2D at 0x7ff938c59a90>]
../_images/plotting_5_1.png
  1. Create a figure and axis object which stores the information from our graph.

  2. Generate data that we will plot.

  3. Use the x and y data, and make a line plot on our axis, ax, by calling the plot method.

Difference between Figure and Axis

We’ve found that the easiest way for us to distinguish between the figure and axis objects is to think about them as a framed painting.

The axis is the canvas; it is where we “draw” our plots.

The figure is the entire framed painting (which inclues the axis itself!).

We can also see this by setting certain elements of the figure to different colors.

fig, ax = plt.subplots()

fig.set_facecolor("red")
ax.set_facecolor("blue")
../_images/plotting_7_0.png

This difference also means that you can place more than one axis on a figure.

# We specified the shape of the axes -- It means we will have two rows and three columns
# of axes on our figure
fig, axes = plt.subplots(2, 3)

fig.set_facecolor("gray")

# Can choose hex colors
colors = ["#065535", "#89ecda", "#ffd1dc", "#ff0000", "#6897bb", "#9400d3"]

# axes is a numpy array and we want to iterate over a flat version of it
for (ax, c) in zip(axes.flat, colors):
    ax.set_facecolor(c)

fig.tight_layout()
../_images/plotting_9_0.png

Functionality

The matplotlib library is versatile and very flexible.

You can see various examples of what it can do on the matplotlib example gallery.

We work though a few examples to quickly introduce some possibilities.

Bar

countries = ["CAN", "MEX", "USA"]
populations = [36.7, 129.2, 325.700]
land_area = [3.850, 0.761, 3.790]

fig, ax = plt.subplots(2)

ax[0].bar(countries, populations, align="center")
ax[0].set_title("Populations (in millions)")

ax[1].bar(countries, land_area, align="center")
ax[1].set_title("Land area (in millions miles squared)")

fig.tight_layout()
../_images/plotting_11_0.png

Scatter and annotation

N = 50

np.random.seed(42)

x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = np.pi * (15 * np.random.rand(N))**2  # 0 to 15 point radii

fig, ax = plt.subplots()

ax.scatter(x, y, s=area, c=colors, alpha=0.5)

ax.annotate(
    "First point", xy=(x[0], y[0]), xycoords="data",
    xytext=(25, -25), textcoords="offset points",
    arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=0.6")
)
Text(25, -25, 'First point')
../_images/plotting_13_1.png

Fill between

x = np.linspace(0, 1, 500)
y = np.sin(4 * np.pi * x) * np.exp(-5 * x)

fig, ax = plt.subplots()

ax.grid(True)
ax.fill(x, y)
[<matplotlib.patches.Polygon at 0x7ff9389c7220>]
../_images/plotting_15_1.png