QuantEcon DataScience

Introduction to Economic Modeling and Data Science

Mapping in Python

Co-author

Prerequisites

Outcomes

  • Use geopandas to create maps
In [1]:
# Uncomment following line to install on colab
#! pip install qeds fiona geopandas xgboost gensim folium pyLDAvis descartes
In [2]:
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd

from shapely.geometry import Point

%matplotlib inline
# activate plot theme
import qeds
qeds.themes.mpl_style();

Mapping in Python

In this lecture, we will use a new package, geopandas, to create maps.

Maps are really quite complicated… We are trying to project a spherical surface onto a flat figure, which is an inherently complicated endeavor.

Luckily, geopandas will do most of the heavy lifting for us.

Let’s start with a DataFrame that has the latitude and longitude coordinates of various South American cities.

Our goal is to turn them into something we can plot – in this case, a GeoDataFrame.

In [3]:
df = pd.DataFrame({
    'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
    'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
    'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
    'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]
})

In order to map the cities, we need tuples of coordinates.

We generate them by zipping the latitude and longitude together to store them in a new column named Coordinates.

In [4]:
df["Coordinates"] = list(zip(df.Longitude, df.Latitude))
df.head()
Out[4]:
City Country Latitude Longitude Coordinates
0 Buenos Aires Argentina -34.58 -58.66 (-58.66, -34.58)
1 Brasilia Brazil -15.78 -47.91 (-47.91, -15.78)
2 Santiago Chile -33.45 -70.66 (-70.66, -33.45)
3 Bogota Colombia 4.60 -74.08 (-74.08, 4.6)
4 Caracas Venezuela 10.48 -66.86 (-66.86, 10.48)

Our next step is to turn the tuple into a Shapely Point object.

We will do this by applying Shapely’s Point method to the Coordinates column.

In [5]:
df["Coordinates"] = df["Coordinates"].apply(Point)
df.head()
Out[5]:
City Country Latitude Longitude Coordinates
0 Buenos Aires Argentina -34.58 -58.66 POINT (-58.66 -34.58)
1 Brasilia Brazil -15.78 -47.91 POINT (-47.91 -15.78)
2 Santiago Chile -33.45 -70.66 POINT (-70.66 -33.45)
3 Bogota Colombia 4.60 -74.08 POINT (-74.08 4.6)
4 Caracas Venezuela 10.48 -66.86 POINT (-66.86 10.48)

Finally, we will convert our DataFrame into a GeoDataFrame by calling the geopandas.DataFrame method.

Conveniently, a GeoDataFrame is a data structure with the convenience of a normal DataFrame but also an understanding of how to plot maps.

In the code below, we must specify the column that contains the geometry data.

See this excerpt from the docs

The most important property of a GeoDataFrame is that it always has one GeoSeries column that holds a special status. This GeoSeries is referred to as the GeoDataFrame’s “geometry”. When a spatial method is applied to a GeoDataFrame (or a spatial attribute like area is called), this commands will always act on the “geometry” column.

In [6]:
gdf = gpd.GeoDataFrame(df, geometry="Coordinates")
gdf.head()
Out[6]:
City Country Latitude Longitude Coordinates
0 Buenos Aires Argentina -34.58 -58.66 POINT (-58.66 -34.58)
1 Brasilia Brazil -15.78 -47.91 POINT (-47.91 -15.78)
2 Santiago Chile -33.45 -70.66 POINT (-70.66 -33.45)
3 Bogota Colombia 4.60 -74.08 POINT (-74.08 4.6)
4 Caracas Venezuela 10.48 -66.86 POINT (-66.86 10.48)
In [7]:
# Doesn't look different than a vanilla DataFrame...let's make sure we have what we want
print('gdf is of type:', type(gdf))

# And how can we tell which column is the geometry column?
print('\nThe geometry column is:', gdf.geometry.name)
gdf is of type: <class 'geopandas.geodataframe.GeoDataFrame'>

The geometry column is: Coordinates

Plotting a Map

Great, now we have our points in the GeoDataFrame.

Let’s plot the locations on a map.

This will require 3 steps

  1. Get the map
  2. Plot the map
  3. Plot the points (our cities) on the map

1. Get the map

An organization called Natural Earth compiled the map data that we use here.

The file provides the outlines of countries, over which we’ll plot the city locations from our GeoDataFrame.

Luckily, geopandas already comes bundled with this data, so we don’t have to hunt it down!

In [8]:
# Grab low resolution world file
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world = world.set_index("iso_a3")

world.head()
Out[8]:
pop_est continent name gdp_md_est geometry
iso_a3
FJI 920938 Oceania Fiji 8374.0 (POLYGON ((180 -16.06713266364245, 180 -16.555...
TZA 53950935 Africa Tanzania 150600.0 POLYGON ((33.90371119710453 -0.950000000000000...
ESH 603253 Africa W. Sahara 906.5 POLYGON ((-8.665589565454809 27.65642588959236...
CAN 35623680 North America Canada 1674000.0 (POLYGON ((-122.84 49.00000000000011, -122.974...
USA 326625791 North America United States of America 18560000.0 (POLYGON ((-122.84 49.00000000000011, -120 49....

world is a GeoDataFrame with the following columns:

  • pop_est: Contains a population estimate for the country
  • continent: The country’s continent
  • name: The country’s name
  • iso_a3: The country’s 3 letter abbreviation (we made this the index)
  • gdp_md_est: An estimate of country’s GDP
  • geometry: A POLYGON for each country (we will learn more about these soon)
In [9]:
world.geometry.name
Out[9]:
'geometry'

Notice that the geometry for this GeoDataFrame is stored in the geometry column.

A quick note about polygons

Instead of points (as our cities are), the geometry objects are now polygons.

A polygon is what you already likely think it is – a collection of ordered points connected by straight lines.

The smaller the distance between the points, the more readily the polygon can approximate non-linear shapes.

Let’s see an example of a polygon.

In [10]:
world.loc["ALB", 'geometry']
Out[10]:

Notice that it displayed the country of Albania.

In [11]:
# Returns two arrays that hold the x and y coordinates of the points that define the polygon's exterior.
x, y = world.loc["ALB", "geometry"].exterior.coords.xy

# How many points?
print('Points in the exterior of Albania:', len(x))
Points in the exterior of Albania: 24

Let’s see another

In [12]:
world.loc["AFG", "geometry"]
Out[12]:
In [13]:
# Returns two arrays that hold the x and y coordinates of the points that define the polygon's exterior.
x, y = world.loc["AFG", 'geometry'].exterior.coords.xy

# How many points?
print('Points in the exterior of Afghanistan:', len(x))
Points in the exterior of Afghanistan: 69

Notice that we’ve now displayed Afghanistan.

This is a more complex shape than Albania and thus required more points.

2. Plotting the map

In [14]:
fig, gax = plt.subplots(figsize=(10,10))

# By only plotting rows in which the continent is 'South America' we only plot SA.
world.query("continent == 'South America'").plot(ax=gax, edgecolor='black',color='white')

# By the way, if you haven't read the book 'longitude' by Dava Sobel, you should...
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')

gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()

Creating this map may have been easier than you expected!

In reality, a lot of heavy lifting is going on behind the scenes.

Entire university classes (and even majors!) focus on the theory and thought that goes into creating maps, but, for now, we are happy to rely on the work done by the experts behind geopandas and its related libraries.

3. Plot the cities

In the code below, we run the same commands as before to plot the South American countries, but , now, we also plot the data in gdf, which contains the location of South American cities.

In [15]:
# Step 3: Plot the cities onto the map
# We mostly use the code from before --- we still want the country borders plotted --- and we
# add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))

# By only plotting rows in which the continent is 'South America' we only plot, well,
# South America.
world.query("continent == 'South America'").plot(ax = gax, edgecolor='black', color='white')

# This plot the cities. It's the same syntax, but we are plotting from a different GeoDataFrame.
# I want the cities as pale red dots.
gdf.plot(ax=gax, color='red', alpha = 0.5)

gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('South America')

gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()

Adding labels to points.

Finally, we might want to consider annotating the cities so we know which cities are which.

In [16]:
# Step 3: Plot the cities onto the map
# We mostly use the code from before --- we still want the country borders plotted --- and we add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))

# By only plotting rows in which the continent is 'South America' we only plot, well, South America.
world.query("continent == 'South America'").plot(ax = gax, edgecolor='black', color='white')

# This plot the cities. It's the same syntax, but we are plotting from a different GeoDataFrame. I want the
# cities as pale red dots.
gdf.plot(ax=gax, color='red', alpha = 0.5)

gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('South America')

# Kill the spines...
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

# ...or get rid of all the axis. Is it important to know the lat and long?
# plt.axis('off')


# Label the cities
for x, y, label in zip(gdf['Coordinates'].x, gdf['Coordinates'].y, gdf['City']):
    gax.annotate(label, xy=(x,y), xytext=(4,4), textcoords='offset points')

plt.show()

Case Study: Voting in Wisconsin

In the example that follows, we will demonstrate how each county in Wisconsin voted during the 2016 Presidential Election.

Along the way, we will learn a couple of valuable lessons:

  1. Where to find shape files for US states and counties
  2. How to match census style data to shape files

Find and Plot State Border

Our first step will be to find the border for the state of interest. This can be found on the US Census’s website here.

You can download the cb_2016_us_state_5m.zip by hand, or simply allow geopandas to extract the relevant information from the zip file online.

In [17]:
state_df = gpd.read_file("http://www2.census.gov/geo/tiger/GENZ2016/shp/cb_2016_us_state_5m.zip")
state_df.head()
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-17-5f4ee811748c> in <module>
----> 1 state_df = gpd.read_file("http://www2.census.gov/geo/tiger/GENZ2016/shp/cb_2016_us_state_5m.zip")
      2 state_df.head()

~/anaconda3/lib/python3.7/site-packages/geopandas/io/file.py in read_file(filename, bbox, **kwargs)
     67     """
     68     if _is_url(filename):
---> 69         req = _urlopen(filename)
     70         path_or_bytes = req.read()
     71         reader = fiona.BytesCollection

~/anaconda3/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/anaconda3/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/anaconda3/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~/anaconda3/lib/python3.7/urllib/request.py in error(self, proto, *args)
    561             http_err = 0
    562         args = (dict, proto, meth_name) + args
--> 563         result = self._call_chain(*args)
    564         if result:
    565             return result

~/anaconda3/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~/anaconda3/lib/python3.7/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
    753         fp.close()
    754 
--> 755         return self.parent.open(new, timeout=req.timeout)
    756 
    757     http_error_301 = http_error_303 = http_error_307 = http_error_302

~/anaconda3/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/anaconda3/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~/anaconda3/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/anaconda3/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~/anaconda3/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 500: Internal Server Error
In [18]:
print(state_df.columns)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-18-6175f1f3dd43> in <module>
----> 1 print(state_df.columns)

NameError: name 'state_df' is not defined

We have various columns, but, most importantly, we can find the right geometry by filtering by name.

In [19]:
fig, gax = plt.subplots(figsize=(10, 10))
state_df.query("NAME == 'Wisconsin'").plot(ax=gax, edgecolor="black", color="white")
plt.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-19-15869871e9bb> in <module>
      1 fig, gax = plt.subplots(figsize=(10, 10))
----> 2 state_df.query("NAME == 'Wisconsin'").plot(ax=gax, edgecolor="black", color="white")
      3 plt.show()

NameError: name 'state_df' is not defined

Find and Plot County Borders

Next, we will add the county borders to our map.

The county shape files (for the entire US) can be found on the Census site.

Once again, we will use the 5m resolution.

In [20]:
county_df = gpd.read_file("http://www2.census.gov/geo/tiger/GENZ2016/shp/cb_2016_us_county_5m.zip")
county_df.head()
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-20-66ac507cd0f8> in <module>
----> 1 county_df = gpd.read_file("http://www2.census.gov/geo/tiger/GENZ2016/shp/cb_2016_us_county_5m.zip")
      2 county_df.head()

~/anaconda3/lib/python3.7/site-packages/geopandas/io/file.py in read_file(filename, bbox, **kwargs)
     67     """
     68     if _is_url(filename):
---> 69         req = _urlopen(filename)
     70         path_or_bytes = req.read()
     71         reader = fiona.BytesCollection

~/anaconda3/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/anaconda3/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/anaconda3/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~/anaconda3/lib/python3.7/urllib/request.py in error(self, proto, *args)
    561             http_err = 0
    562         args = (dict, proto, meth_name) + args
--> 563         result = self._call_chain(*args)
    564         if result:
    565             return result

~/anaconda3/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~/anaconda3/lib/python3.7/urllib/request.py in http_error_302(self, req, fp, code, msg, headers)
    753         fp.close()
    754 
--> 755         return self.parent.open(new, timeout=req.timeout)
    756 
    757     http_error_301 = http_error_303 = http_error_307 = http_error_302

~/anaconda3/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/anaconda3/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~/anaconda3/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/anaconda3/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~/anaconda3/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 500: Internal Server Error
In [21]:
print(county_df.columns)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-21-0898bd43c2fa> in <module>
----> 1 print(county_df.columns)

NameError: name 'county_df' is not defined

Wisconsin’s FIPS code is 55 so we will make sure that we only keep those counties.

In [22]:
county_df = county_df.query("STATEFP == '55'")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-22-f5c32cfd457e> in <module>
----> 1 county_df = county_df.query("STATEFP == '55'")

NameError: name 'county_df' is not defined

Now we can plot all counties in Wisconsin.

In [23]:
fig, gax = plt.subplots(figsize=(10, 10))

state_df.query("NAME == 'Wisconsin'").plot(ax=gax, edgecolor="black", color="white")
county_df.plot(ax=gax, edgecolor="black", color="white")

plt.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-e108da13e822> in <module>
      1 fig, gax = plt.subplots(figsize=(10, 10))
      2 
----> 3 state_df.query("NAME == 'Wisconsin'").plot(ax=gax, edgecolor="black", color="white")
      4 county_df.plot(ax=gax, edgecolor="black", color="white")
      5 

NameError: name 'state_df' is not defined

Get Vote Data

The final step is to get the vote data, which can be found online on this site.

Our friend Kim says,

Go ahead and open up the file. It’s a mess! I saved a cleaned up version of the file to results.csv which we can use to save the hassle with cleaning the data. For fun, you should load the raw data and try beating it into shape. That’s what you normally would have to do… and it’s fun.

We’d like to add that such an exercise is also “good for you” (similar to how vegetables are good for you).

But, for the example in class, we’ll simply start with his cleaned data.

In [24]:
results = pd.read_csv("https://datascience.quantecon.org/assets/data/ruhl_cleaned_results.csv", thousands=",")
results.head()
Out[24]:
county total trump clinton
0 ADAMS 10130 5966 3745
1 ASHLAND 8032 3303 4226
2 BARRON 22671 13614 7889
3 BAYFIELD 9612 4124 4953
4 BROWN 129011 67210 53382

Notice that this is NOT a GeoDataFrame; it has no geographical information.

But it does have the names of each county.

We will be able to use this to match to the counties from county_df.

First, we need to finish up the data cleaning.

In [25]:
results["county"] = results["county"].str.title()
results["county"] = results["county"].str.strip()
county_df["NAME"] = county_df["NAME"].str.title()
county_df["NAME"] = county_df["NAME"].str.strip()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-0e4b30e71617> in <module>
      1 results["county"] = results["county"].str.title()
      2 results["county"] = results["county"].str.strip()
----> 3 county_df["NAME"] = county_df["NAME"].str.title()
      4 county_df["NAME"] = county_df["NAME"].str.strip()

NameError: name 'county_df' is not defined

Then, we can merge election results with the county data.

In [26]:
res_w_states = county_df.merge(results, left_on="NAME", right_on="county", how="inner")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-26-32382e08588f> in <module>
----> 1 res_w_states = county_df.merge(results, left_on="NAME", right_on="county", how="inner")

NameError: name 'county_df' is not defined

Next, we’ll create a new variable called trump_share, which will denote the percentage of votes that Donald Trump won during the election.

In [27]:
res_w_states["trump_share"] = res_w_states["trump"] / (res_w_states["total"])
res_w_states["rel_trump_share"] = res_w_states["trump"] / (res_w_states["trump"]+res_w_states["clinton"])
res_w_states.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-4d8ad06a8836> in <module>
----> 1 res_w_states["trump_share"] = res_w_states["trump"] / (res_w_states["total"])
      2 res_w_states["rel_trump_share"] = res_w_states["trump"] / (res_w_states["trump"]+res_w_states["clinton"])
      3 res_w_states.head()

NameError: name 'res_w_states' is not defined

Finally, we can create our map.

In [28]:
fig, gax = plt.subplots(figsize = (10,8))

# Plot the state
state_df[state_df['NAME'] == 'Wisconsin'].plot(ax = gax, edgecolor='black',color='white')

# Plot the counties and pass 'rel_trump_share' as the data to color
res_w_states.plot(
    ax=gax, edgecolor='black', column='rel_trump_share', legend=True, cmap='RdBu_r',
    vmin=0.2, vmax=0.8
)

# Add text to let people know what we are plotting
gax.annotate('Republican vote share',xy=(0.76, 0.06),  xycoords='figure fraction')

# I don't want the axis with long and lat
plt.axis('off')

plt.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-28-dab96bb0ff42> in <module>
      2 
      3 # Plot the state
----> 4 state_df[state_df['NAME'] == 'Wisconsin'].plot(ax = gax, edgecolor='black',color='white')
      5 
      6 # Plot the counties and pass 'rel_trump_share' as the data to color

NameError: name 'state_df' is not defined

What do you see from this map?

How many counties did Trump win? How many did Clinton win?

In [29]:
res_w_states.eval("trump > clinton").sum()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-29-e4424d2d398b> in <module>
----> 1 res_w_states.eval("trump > clinton").sum()

NameError: name 'res_w_states' is not defined
In [30]:
res_w_states.eval("clinton > trump").sum()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-30-3fd83676f9a7> in <module>
----> 1 res_w_states.eval("clinton > trump").sum()

NameError: name 'res_w_states' is not defined

Who had more votes? Do you think a comparison in counties won or votes won is more reasonable? Why do you think they diverge?

In [31]:
res_w_states["trump"].sum()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-31-cf935a3a35ee> in <module>
----> 1 res_w_states["trump"].sum()

NameError: name 'res_w_states' is not defined
In [32]:
res_w_states["clinton"].sum()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-32-8cb7ab65bda0> in <module>
----> 1 res_w_states["clinton"].sum()

NameError: name 'res_w_states' is not defined

What story could you tell about this divergence?

Interactivity

Multiple Python libraries can help create interactive figures.

Here, we will see an example using bokeh.

In the another lecture, we will see an example with folium.

In [33]:
from bokeh.io import output_notebook
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar, HoverTool
from bokeh.palettes import brewer
output_notebook()
import json
res_w_states["clinton_share"] = res_w_states["clinton"] / res_w_states["total"]
#Convert data to geojson for bokeh
wi_geojson=GeoJSONDataSource(geojson=res_w_states.to_json())
Loading BokehJS ...
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-33-33523424b619> in <module>
      7 output_notebook()
      8 import json
----> 9 res_w_states["clinton_share"] = res_w_states["clinton"] / res_w_states["total"]
     10 #Convert data to geojson for bokeh
     11 wi_geojson=GeoJSONDataSource(geojson=res_w_states.to_json())

NameError: name 'res_w_states' is not defined
In [34]:
color_mapper = LinearColorMapper(palette = brewer['RdBu'][10], low = 0, high = 1)
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
                     border_line_color=None,location = (0,0), orientation = 'horizontal')
hover = HoverTool(tooltips = [ ('County','@county'),('Portion Trump', '@trump_share'),
                               ('Portion Clinton','@clinton_share'),
                               ('Total','@total')])
p = figure(title="Wisconsin Voting in 2016 Presidential Election", tools=[hover])
p.patches("xs","ys",source=wi_geojson,
          fill_color = {'field' :'rel_trump_share', 'transform' : color_mapper})
p.add_layout(color_bar, 'below')
show(p)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-34-f1391e0c380a> in <module>
      6                                ('Total','@total')])
      7 p = figure(title="Wisconsin Voting in 2016 Presidential Election", tools=[hover])
----> 8 p.patches("xs","ys",source=wi_geojson,
      9           fill_color = {'field' :'rel_trump_share', 'transform' : color_mapper})
     10 p.add_layout(color_bar, 'below')

NameError: name 'wi_geojson' is not defined

Download

Launch Notebook