QuantEcon DataScience

Introduction to Economic Modeling and Data Science

Collections

Prerequisites

Outcomes

  • Ordered Collections

    • Know what a list is and a tuple is
    • Know how to tell a list from a tuple
    • Understand the range, zip and enumerate functions
    • Be able to use common list methods like append, sort, and reverse
  • Associative Collections

    • Understand what a dict is
    • Know the distinction between a dicts keys and values
    • Understand when dicts are useful
    • Be familiar with common dict methods
  • Sets (optional)

    • Know what a set is
    • Understand how a set differs from a list and a tuple
    • Know when to use a set vs a list or a tuple

Outline

Ordered Collections

Lists

A Python list is an ordered collection of items.

We can create lists using the following syntax

[item1, item2, ...,  itemN]

where the ... represents any number of additional items.

Each item can be of any type.

Let’s create some lists.

In [1]:
# created, but not assigned to a variable
[2.0, 9.1, 12.5]
Out[1]:
[2.0, 9.1, 12.5]
In [2]:
# stored as the variable `x`
x = [2.0, 9.1, 12.5]
print("x has type", type(x))
x
x has type <class 'list'>
Out[2]:
[2.0, 9.1, 12.5]

What Can We Do with Lists?

We can access items in a list called mylist using mylist[N] where N is an integer.

Note: Anytime that we use the syntax x[i] we are doing what is called indexing – it means that we are selecting a particular element of a collection x.

In [3]:
x[1]
Out[3]:
9.1

Wait? Why did x[1] return 9.1 when the first element in x is actually 2.0?

This happened because Python starts counting at zero!

Lets repeat that one more time for emphasis Python starts counting at zero!

To access the first element of x we must use x[0]:

In [4]:
x[0]
Out[4]:
2.0

We can also determine how many items are in a list using the len function.

In [5]:
len(x)
Out[5]:
3

What happens if we try to index with a number higher than the number of items in a list?

In [6]:
# uncomment the line below and run
# x[4]

We can check if a list contains an element using the in keyword.

In [7]:
2.0 in x
Out[7]:
True
In [8]:
1.5 in x
Out[8]:
False

For our list x, other common operations we might want to do are…

In [9]:
x.reverse()
x
Out[9]:
[12.5, 9.1, 2.0]
In [10]:
number_list = [10, 25, 42, 1.0]
print(number_list)
number_list.sort()
print(number_list)
[10, 25, 42, 1.0]
[1.0, 10, 25, 42]

Note that in order to sort, we had to have all elements in our list be numbers (int and float), more on this below.

We could actually do the same with a list of strings. In this case, sort will put the items in alphabetical order.

In [11]:
str_list = ["NY", "AZ", "TX"]
print(str_list)
str_list.sort()
print(str_list)
['NY', 'AZ', 'TX']
['AZ', 'NY', 'TX']

The append method adds an element to the end of existing list.

In [12]:
num_list = [10, 25, 42, 8]
print(num_list)
num_list.append(10)
print(num_list)
[10, 25, 42, 8]
[10, 25, 42, 8, 10]

However, if you call append with a list, it adds a list to the end, rather than the numbers in that list.

In [13]:
num_list = [10, 25, 42, 8]
print(num_list)
num_list.append([20, 4])
print(num_list)
[10, 25, 42, 8]
[10, 25, 42, 8, [20, 4]]

To combine the lists instead…

In [14]:
num_list = [10, 25, 42, 8]
print(num_list)
num_list.extend([20, 4])
print(num_list)
[10, 25, 42, 8]
[10, 25, 42, 8, 20, 4]

See exercise 1 in the exercise list

Lists of Different Types

While most examples above have all used a list with a single type of variable, this is not required.

Let’s carefully make a small change to the first example: replace 2.0 with 2

In [15]:
x = [2, 9.1, 12.5]

This behavior is identical for many operations you might apply to a list.

In [16]:
import numpy as np
x = [2, 9.1, 12.5]
np.mean(x) == sum(x)/len(x)
Out[16]:
True

Here we have also introduced a new module, Numpy, which provides many functions for working with numeric data.

Taking this further, we can put completely different types of elements inside of a list.

In [17]:
# stored as the variable `x`
x = [2, "hello", 3.0]
print("x has type", type(x))
x
x has type <class 'list'>
Out[17]:
[2, 'hello', 3.0]

To see the types of individual elements in the list:

In [18]:
print(f"type(x[0]) = {type(x[0])}, type(x[0]) = {type(x[1])}, type(x[2]) = {type(x[2])}")
type(x[0]) = <class 'int'>, type(x[0]) = <class 'str'>, type(x[2]) = <class 'float'>

While no programming limitations prevent this, you should be careful if you write code with different numeric and non-numeric types in the same list.

For example, if the types within the list cannot be compared, then how could you sort the elements of the list? (i.e. How do you determine whether the string “hello” is less than the integer 2, “hello” < 2?)

In [19]:
x = [2, "hello", 3.0]
# uncomment the line below and see what happens!
# x.sort()

A few key exceptions to this general rule are:

  • Lists with both integers and floating points are less error-prone (since mathematical code using the list would work with both types).
  • When working with lists and data, you may want to represent missing values with a different type than the existing values.

The range Function

One function you will see often in Python is the range function.

It has three versions:

  1. range(N): goes from 0 to N-1
  2. range(a, N): goes from a to N-1
  3. range(a, N, d): goes from a to N-1, counting by d

When we call the range function, we get back something that has type range:

In [20]:
r = range(5)
print("type(r)", type(r))
type(r) <class 'range'>

To turn the range into a list:

In [21]:
list(r)
Out[21]:
[0, 1, 2, 3, 4]

See exercise 2 in the exercise list

What are Tuples?

Tuples are very similar to lists and hold ordered collections of items.

However, tuples and lists have three main differences:

  1. Tuples are created using parenthesis — ( and ) — instead of square brackets — [ and ].
  2. Tuples are immutable, which is a fancy computer science word meaning that they can’t be changed or altered after they are created.
  3. Tuples and multiple return values from functions are tightly connected, as we will see in functions.
In [22]:
t = (1, "hello", 3.0)
print("t is a", type(t))
t
t is a <class 'tuple'>
Out[22]:
(1, 'hello', 3.0)

We can convert a list to a tuple by calling the tuple function on a list.

In [23]:
print("x is a", type(x))
print("tuple(x) is a", type(tuple(x)))
tuple(x)
x is a <class 'list'>
tuple(x) is a <class 'tuple'>
Out[23]:
(2, 'hello', 3.0)

We can also convert a tuple to a list using the list function.

In [24]:
list(t)
Out[24]:
[1, 'hello', 3.0]

As with a list, we access items in a tuple t using t[N] where N is an int.

In [25]:
t[0]  # still start counting at 0
Out[25]:
1
In [26]:
t[2]
Out[26]:
3.0

See exercise 3 in the exercise list

Tuples (and lists) can be unpacked directly into variables.

In [27]:
x, y = (1, "test")
print(f"x = {x}, y = {y}")
x = 1, y = test

This will be a convenient way to work with functions returning multiple values, as well as within comprehensions and loops.

List vs Tuple: Which to Use?

Should you use a list or tuple?

This depends on what you are storing, whether you might need to reorder the elements, or whether you’d add new elements without a complete reinterpretation of the underlying data.

For example, take data representing the GDP (in trillions) and population (in billions) for China in 2015.

In [28]:
china_data_2015 = ("China", 2015, 11.06, 1.371)

print(china_data_2015)
('China', 2015, 11.06, 1.371)

In this case, we have used a tuple since: (a) ordering would be meaningless; and (b) adding more data would require a reinterpretation of the whole data structure.

On the other hand, consider a list of GDP in China between 2013 and 2015.

In [29]:
gdp_data = [9.607, 10.48, 11.06]
print(gdp_data)
[9.607, 10.48, 11.06]

In this case, we have used a list, since adding on a new element to the end of the list for GDP in 2016 would make complete sense.

Along these lines, collecting data on China for different years may make sense as a list of tuples (e.g. year, GDP, and population – although we will see better ways to store this sort of data in the Pandas section).

In [30]:
china_data = [(2015, 11.06, 1.371), (2014, 10.48, 1.364), (2013, 9.607, 1.357)]
print(china_data)
[(2015, 11.06, 1.371), (2014, 10.48, 1.364), (2013, 9.607, 1.357)]

In general, a rule of thumb is to use a list unless you need to use a tuple.

Key criteria for tuple use are when you want to:

  • ensure the order of elements can’t change
  • ensure the actual values of the elements can’t change
  • use the collection as a key in a dict (we will learn what this means soon)

zip and enumerate

Two functions that can be extremely useful are zip and enumerate.

Both of these functions are best understood by example, so let’s see them in action and then talk about what they do.

In [31]:
gdp_data = [9.607, 10.48, 11.06]
years = [2013, 2014, 2015]
z = zip(years, gdp_data)
print("type(z)", type(z))
type(z) <class 'zip'>

To see what is inside z, let’s convert it to a list.

In [32]:
list(z)
Out[32]:
[(2013, 9.607), (2014, 10.48), (2015, 11.06)]

Notice that we now have a list where each item is a tuple.

Within each tuple, we have one item from each of the collections we passed to the zip function.

In particular, the first item in z contains the first item from [2013, 2014, 2015] and the first item from [9.607, 10.48, 11.06].

The second item in z contains the second item from each collection and so on.

We can access an element in this and then unpack the resulting tuple directly into variables.

In [33]:
l = list(zip(years, gdp_data))
x, y = l[0]
print(f"year = {x}, GDP = {y}")
year = 2013, GDP = 9.607

Now let’s experiment with enumerate.

In [34]:
e = enumerate(["a", "b", "c"])
print("type(e)", type(e))
e
type(e) <class 'enumerate'>
Out[34]:
<enumerate at 0x7f48457a3090>

Again, we call list(e) to see what is inside.

In [35]:
list(e)
Out[35]:
[(0, 'a'), (1, 'b'), (2, 'c')]

We again have a list of tuples, but this time, the first element in each tuple is the index of the second tuple element in the initial collection.

Notice that the third item is (2, 'c') because ["a", "b", "c"][2] is 'c'

See exercise 4 in the exercise list

An important quirk of some iterable types that are not lists (such as the above zip) is that you cannot convert the same type to a list twice.

This is because zip, enumerate, and range produce what is called a generator.

A generator will only produce each of its elements a single time, so if you call list on the same generator a second time, it will not have any elements to iterate over anymore.

For more information, refer to the Python documentation.

In [36]:
gdp_data = [9.607, 10.48, 11.06]
years = [2013, 2014, 2015]
z = zip(years, gdp_data)
l = list(z)
print(l)
m = list(z)
print(m)
[(2013, 9.607), (2014, 10.48), (2015, 11.06)]
[]

Associative Collections

Dictionaries

A dictionary (or dict) associates keys with values.

It will feel similar to a dictionary for words, where the keys are words and the values are the associated definitions.

The most common way to create a dict is to use curly braces — { and } — like this:

{"key1": value1, "key2": value2, ..., "keyN": valueN}

where the ... indicates that we can have any number of additional terms.

The crucial part of the syntax is that each key-value pair is written key: value and that these pairs are separated by commas — ,.

Let’s see an example using our aggregate data on China in 2015.

In [37]:
china_data = {"country": "China", "year": 2015, "GDP" : 11.06, "population": 1.371}
print(china_data)
{'country': 'China', 'year': 2015, 'GDP': 11.06, 'population': 1.371}

Unlike our above example using a tuple, a dict allows us to associate a name with each field, rather than having to remember the order within the tuple.

Often, code that makes a dict is easier to read if we put each key: value pair on its own line. (Recall our earlier comment on using whitespace effectively to improve readability!)

The code below is equivalent to what we saw above.

In [38]:
china_data = {
    "country": "China",
    "year": 2015,
    "GDP" : 11.06,
    "population": 1.371
}

Most often, the keys (e.g. “country”, “year”, “GDP”, and “population”) will be strings, but we could also use numbers (int, or float) or even tuples (or, rarely, a combination of types).

The values can be any type and different from each other.

See exercise 5 in the exercise list

This next example is meant to emphasize how values can be anything – including another dictionary.

In [39]:
companies = {"AAPL": {"bid": 175.96, "ask": 175.98},
             "GE": {"bid": 1047.03, "ask": 1048.40},
             "TVIX": {"bid": 8.38, "ask": 8.40}}
print(companies)
{'AAPL': {'bid': 175.96, 'ask': 175.98}, 'GE': {'bid': 1047.03, 'ask': 1048.4}, 'TVIX': {'bid': 8.38, 'ask': 8.4}}

Getting, Setting, and Updating dict Items

We can now ask Python to tell us the value for a particular key by using the syntax d[k], where d is our dict and k is the key for which we want to find the value.

For example,

In [40]:
print(china_data["year"])
print(f"country = {china_data['country']}, population = {china_data['population']}")
2015
country = China, population = 1.371

Note: when inside of a formatting string, you can use ' instead of " as above to ensure the formatting still works with the embedded code.

If we ask for the value of a key that is not in the dict, we will get an error.

In [41]:
# uncomment the line below to see the error
# china_data["inflation"]

We can also add new items to a dict using the syntax d[new_key] = new_value.

Let’s see some examples.

In [42]:
print(china_data)
china_data["unemployment"] = "4.05%"
print(china_data)
{'country': 'China', 'year': 2015, 'GDP': 11.06, 'population': 1.371}
{'country': 'China', 'year': 2015, 'GDP': 11.06, 'population': 1.371, 'unemployment': '4.05%'}

To update the value, we use assignment in the same way (which will create the key and value as required).

In [43]:
print(china_data)
china_data["unemployment"] = "4.051%"
print(china_data)
{'country': 'China', 'year': 2015, 'GDP': 11.06, 'population': 1.371, 'unemployment': '4.05%'}
{'country': 'China', 'year': 2015, 'GDP': 11.06, 'population': 1.371, 'unemployment': '4.051%'}

Or we could change the type.

In [44]:
china_data["unemployment"] = 4.051
print(china_data)
{'country': 'China', 'year': 2015, 'GDP': 11.06, 'population': 1.371, 'unemployment': 4.051}

See exercise 6 in the exercise list

Common dict Functionality

We can do some common things with dicts.

We will demonstrate them with examples below.

In [45]:
# number of key-value pairs in a dict
len(china_data)
Out[45]:
5
In [46]:
# get a list of all the keys
list(china_data.keys())
Out[46]:
['country', 'year', 'GDP', 'population', 'unemployment']
In [47]:
# get a list of all the values
list(china_data.values())
Out[47]:
['China', 2015, 11.06, 1.371, 4.051]
In [48]:
more_china_data = {"irrigated_land": 690_070, "top_religions": {"buddhist": 18.2, "christian" : 5.1, "muslim": 1.8}}

# Add all key-value pairs in mydict2 to mydict.
# if the key already appears in mydict, overwrite the
# value with the value in mydict2
china_data.update(more_china_data)
china_data
Out[48]:
{'country': 'China',
 'year': 2015,
 'GDP': 11.06,
 'population': 1.371,
 'unemployment': 4.051,
 'irrigated_land': 690070,
 'top_religions': {'buddhist': 18.2, 'christian': 5.1, 'muslim': 1.8}}
In [49]:
# Get the value associated with a key or return a default value
# use this to avoid the NameError we saw above if you have a reasonable
# default value
china_data.get("irrigated_land", "Data Not Available")
Out[49]:
690070
In [50]:
china_data.get("death_rate", "Data Not Available")
Out[50]:
'Data Not Available'

See exercise 7 in the exercise list

See exercise 8 in the exercise list

Sets (Optional)

Python has an additional way to represent collections of items: sets.

Sets come up infrequently, but you should be aware of them.

If you are familiar with the mathematical concept of sets, then you will understand the majority of Python sets already.

If you don’t know the math behind sets, don’t worry: we’ll cover the basics of Python’s sets here.

A set is an unordered collection of unique elements.

The syntax for creating a set uses curly bracket { and }.

{item1, item2, ..., itemN}

Here is an example.

In [51]:
s = {1, "hello", 3.0}
print("s has type", type(s))
s
s has type <class 'set'>
Out[51]:
{1, 3.0, 'hello'}

See exercise 9 in the exercise list

As with lists and tuples, we can check if something is in the set and check the set’s length:

In [52]:
print("len(s) =", len(s))
"hello" in s
len(s) = 3
Out[52]:
True

Unlike lists and tuples, we can’t extract elements of a set s using s[N] where N is a number.

# Uncomment the line below to see what happens
# s[1]

This is because sets are not ordered, so the notion of getting the second element (s[1]) is not well defined.

We add elements to a set s using s.add.

In [53]:
s.add(100)
s
Out[53]:
{1, 100, 3.0, 'hello'}
In [54]:
s.add("hello") # nothing happens, why?
s
Out[54]:
{1, 100, 3.0, 'hello'}

We can also do set operations.

Consider the set s from above and the set s2 = {"hello", "world"}.

  • s.union(s2): returns a set with all elements in either s or s2
  • s.intersection(s2): returns a set with all elements in both s and s2
  • s.difference(s2): returns a set with all elements in s that aren’t in s2
  • s.symmetric_difference(s2): returns a set with all elements in only one of s and s2

See exercise 10 in the exercise list

As with tuples and lists, a set function can convert other collections to sets.

In [55]:
x = [1, 2, 3, 1]
set(x)
Out[55]:
{1, 2, 3}
In [56]:
t = (1, 2, 3, 1)
set(t)
Out[56]:
{1, 2, 3}

Likewise, we can convert sets to lists and tuples.

In [57]:
list(s)
Out[57]:
['hello', 1, 3.0, 100]
In [58]:
tuple(s)
Out[58]:
('hello', 1, 3.0, 100)

Exercises

Exercise 1

In the first cell, try y.append(z).

In the second cell try y.extend(z).

Explain the behavior.

HINT: When you are trying to explain use y.append? and y.extend? to see a description of what these methods are supposed to do.

In [59]:
y = ["a", "b", "c"]
z = [1, 2, 3]
# your code here
print(y)
In [60]:
y = ["a", "b", "c"]
z = [1, 2, 3]
# your code here
print(y)

(back to text)

Exercise 2

Experiment with the other two versions of the range function.

In [61]:
# try list(range(a, N)) -- you pick `a` and `N`
In [62]:
# try list(range(a, N, d)) -- you pick `a`, `N`, and `d`

(back to text)

Exercise 3

Verify that tuples are indeed immutable by attempting the following:

  • Changing the first element of t to be 100
  • Appending a new element "!!" to the end of t (remember with a list x we would use x.append("!!") to do this
  • Sorting t
  • Reversing t
In [63]:
# change first element of t
In [64]:
# appending to t
In [65]:
# sorting t
In [66]:
# reversing t

(back to text)

Exercise 4

Challenging For the tuple foo below, use a combination of zip, range, and len to mimic enumerate(foo).

Verify that your proposed solution is correct by converting each to a list and checking equality with ==.

HINT: You can see what the answer should look like by starting with list(enumerate(foo)).

In [67]:
foo = ("good", "luck!")

(back to text)

Exercise 5

Create a new dict which associates stock tickers with its stock price.

Here are some tickers and a price.

  • AAPL: 175.96
  • GOOGL: 1047.43
  • TVIX: 8.38
In [68]:
# your code here

(back to text)

Exercise 6

Look at the World Factbook for Australia and create a dictionary with data containing the following types: float, string, integer, list, and dict. Choose any data you wish.

To confirm, you should have a dictionary that you identified via a key.

In [69]:
# your code here

(back to text)

Exercise 7

Use Jupyter's help facilities to learn how to use the pop method to remove the key "irrigated_land" (and its value) from the dict.

In [70]:
# uncomment and use the Inspector or ?
#china_data.pop(

(back to text)

Exercise 8

Explain what happens to the value you popped.

Experiment with calling pop twice.

In [71]:
# your code here

(back to text)

Exercise 9

Try creating a set with repeated elements (e.g. {1, 2, 1, 2, 1, 2}).

What happens?

Why?

In [72]:
# your code here

(back to text)

Exercise 10

Test out two of the operations described above using the original set we created, s, and the set created below s2.

In [73]:
s2 = {"hello", "world"}
In [74]:
# Operation 1
In [75]:
# Operation 2

Download

Launch Notebook