QuantEcon DataScience

Introduction to Economic Modeling and Data Science

Time series

Prerequisites

Outcomes

  • Know how pandas handles dates
  • Understand how to parse strings into datetime objects
  • Know how to write dates as custom formatted strings
  • Be able to access day, month, year, etc. for a DateTimeIndex and a column with dtype datetime
  • Understand both rolling and re-sampling operations and the difference between the two

Data

  • Bitcoin to USD exchange rates from March 2014 to the present
In [1]:
# Uncomment following line to install on colab
#! pip install qeds
In [2]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import quandl
import qeds

# see section on API keys at end of lecture!
quandl.ApiConfig.api_key = os.environ.get("QUANDL_AUTH", "Dn6BtVoBhzuKTuyo6hbp")
start_date = "2014-05-01"

%matplotlib inline
# activate plot theme
import qeds
qeds.themes.mpl_style();

Intro

pandas has extensive support for handling dates and times.

We will loosely refer to data with date or time information as time series data.

In this lecture, we will cover the most useful parts of pandas’ time series functionality.

Among these topics are:

  • Parsing strings as dates
  • Writing datetime objects as (inverse operation of previous point)
  • Extracting data from a DataFrame or Series with date information in the index
  • Shifting data through time (taking leads or lags)
  • Re-sampling data to a different frequency and rolling operations

However, even more than with previous topics, we will skip a lot of the functionality pandas offers, and we urge you to refer to the official documentation for more information.

Parsing Strings as Dates

When working with time series data, we almost always receive the data with dates encoded as strings.

Hopefully, the date strings follow a structured format or pattern.

One common pattern is YYYY-MM-DD: 4 numbers for the year, 2 for the month, and 2 for the day with each section separated by a -.

For example, we write Christmas day 2017 in this format as

In [3]:
christmas_str = "2017-12-25"

To convert a string into a time-aware object, we use the pd.to_datetime function.

In [4]:
christmas = pd.to_datetime(christmas_str)
print("The type of christmas is", type(christmas))
christmas
The type of christmas is <class 'pandas._libs.tslibs.timestamps.Timestamp'>
Out[4]:
Timestamp('2017-12-25 00:00:00')

The pd.to_datetime function is pretty smart at guessing the format of the date…

In [5]:
for date in ["December 25, 2017", "Dec. 25, 2017",
             "Monday, Dec. 25, 2017", "25 Dec. 2017", "25th Dec. 2017"]:
    print("pandas interprets {} as {}".format(date, pd.to_datetime(date)))
pandas interprets December 25, 2017 as 2017-12-25 00:00:00
pandas interprets Dec. 25, 2017 as 2017-12-25 00:00:00
pandas interprets Monday, Dec. 25, 2017 as 2017-12-25 00:00:00
pandas interprets 25 Dec. 2017 as 2017-12-25 00:00:00
pandas interprets 25th Dec. 2017 as 2017-12-25 00:00:00

However, sometimes we will need to give pandas a hint.

For example, that same time (midnight on Christmas) would be reported on an Amazon transaction report as

In [6]:
christmas_amzn = "2017-12-25T00:00:00+ 00 :00"

If we try to pass this to pd.to_datetime, it will fail.

pd.to_datetime(christmas_amzn)

To parse a date with this format, we need to specify the format argument for pd.to_datetime.

In [7]:
amzn_strftime = "%Y-%m-%dT%H:%M:%S+ 00 :00"
pd.to_datetime(christmas_amzn, format=amzn_strftime)
Out[7]:
Timestamp('2017-12-25 00:00:00')

Can you guess what amzn_strftime represents?

Let’s take a closer look at amzn_strftime and christmas_amzn.

In [8]:
print(amzn_strftime)
print(christmas_amzn)
%Y-%m-%dT%H:%M:%S+ 00 :00
2017-12-25T00:00:00+ 00 :00

Notice that both of the strings have a similar form, but that instead of actual numerical values, amzn_strftime has placeholders.

Specifically, anywhere the % shows up is a signal to the pd.to_datetime function that it is where relevant information is stored.

For example, the %Y is a stand-in for a four digit year, %m is for 2 a digit month, and so on…

The official Python documentation contains a complete list of possible %something patterns that are accepted in the format argument.

See exercise 1 in the exercise list

Multiple Dates

If we have dates in a Series (e.g. column of DataFrame) or a list, we can pass the entire collection to pd.to_datetime and get a collection of dates back.

We’ll just show an example of that here as the mechanics are the same as a single date.

In [9]:
pd.to_datetime(["2017-12-25", "2017-12-31"])
Out[9]:
DatetimeIndex(['2017-12-25', '2017-12-31'], dtype='datetime64[ns]', freq=None)

Date Formatting

We can use the %pattern format to have pandas write datetime objects as specially formatted strings using the strftime (string format time) method.

For example,

In [10]:
christmas.strftime("We love %A %B %d (also written %c)")
Out[10]:
'We love Monday December 25 (also written Mon Dec 25 00:00:00 2017)'

See exercise 2 in the exercise list

Extracting Data

When the index of a DataFrame has date information and pandas recognizes the values as datetime values, we can leverage some convenient indexing features for extracting data.

The flexibility of these features is best understood through example, so let’s load up some data and take a look.

In [11]:
btc_usd = quandl.get("BCHARTS/BITSTAMPUSD", start_date=start_date)
btc_usd.info()
btc_usd.head()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2092 entries, 2014-05-01 to 2020-01-21
Data columns (total 7 columns):
Open                 2092 non-null float64
High                 2092 non-null float64
Low                  2092 non-null float64
Close                2092 non-null float64
Volume (BTC)         2092 non-null float64
Volume (Currency)    2092 non-null float64
Weighted Price       2092 non-null float64
dtypes: float64(7)
memory usage: 130.8 KB
Out[11]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-05-01 449.00 465.88 447.97 460.97 9556.037886 4.381969e+06 458.554960
2014-05-02 460.97 462.99 444.51 454.50 8269.891417 3.731061e+06 451.162018
2014-05-03 452.00 454.50 431.00 439.00 7431.626480 3.271086e+06 440.157544
2014-05-04 439.00 442.83 429.55 438.04 5154.407794 2.245293e+06 435.606483
2014-05-05 435.88 445.00 425.00 433.00 8188.082795 3.547855e+06 433.294968

Here, we have the Bitcoin (BTC) to US dollar (USD) exchange rate from March 2014 until today.

Notice that the type of index is DateTimeIndex.

This is the key that enables things like…

Extracting all data for the year 2015 by passing "2015" to .loc.

In [12]:
btc_usd.loc["2015"]
Out[12]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2015-01-01 321.00 321.00 312.60 313.81 3087.436554 9.745096e+05 315.637119
2015-01-02 313.82 317.01 311.96 315.42 3468.281375 1.092446e+06 314.981849
2015-01-03 315.42 316.58 280.00 282.00 21752.719146 6.475952e+06 297.707695
2015-01-04 280.00 289.39 255.00 264.00 41441.278553 1.126676e+07 271.872950
2015-01-05 264.55 280.00 264.07 276.80 9528.271002 2.596898e+06 272.546601
2015-01-06 0.00 0.00 0.00 0.00 0.000000 0.000000e+00 0.000000
2015-01-07 0.00 0.00 0.00 0.00 0.000000 0.000000e+00 0.000000
2015-01-08 0.00 0.00 0.00 0.00 0.000000 0.000000e+00 0.000000
2015-01-09 272.45 305.00 272.45 293.97 9914.249594 2.898085e+06 292.315153
2015-01-10 293.89 295.00 271.03 275.59 29448.777845 8.367179e+06 284.126537
2015-01-11 275.60 280.94 262.08 266.39 12758.949044 3.480737e+06 272.807499
2015-01-12 266.34 272.43 263.54 267.42 10478.250214 2.823903e+06 269.501401
2015-01-13 267.10 268.15 216.00 227.00 60556.162717 1.428937e+07 235.968829
2015-01-14 227.01 230.89 152.40 171.41 124188.885083 2.357627e+07 189.841991
2015-01-15 172.00 233.90 168.50 210.98 83488.416185 1.731111e+07 207.347429
2015-01-16 210.69 223.15 198.04 208.03 34723.348352 7.316819e+06 210.717544
2015-01-17 208.02 212.84 192.17 199.69 19121.889479 3.838117e+06 200.718520
2015-01-18 199.69 222.00 195.02 211.85 18205.180362 3.818886e+06 209.769209
2015-01-19 211.80 220.94 207.70 216.34 13037.399109 2.781607e+06 213.355949
2015-01-20 216.01 216.81 201.98 210.65 14129.240186 2.974516e+06 210.522039
2015-01-21 211.94 228.90 205.00 227.16 23160.656911 5.023355e+06 216.891751
2015-01-22 227.75 239.00 226.01 234.08 25405.329613 5.911625e+06 232.692312
2015-01-23 234.08 240.00 225.55 233.64 15268.268486 3.544720e+06 232.162559
2015-01-24 233.42 249.99 230.03 249.19 13793.776479 3.338886e+06 242.057429
2015-01-25 249.33 257.00 241.33 255.31 28760.928428 7.179902e+06 249.640836
2015-01-26 254.67 309.90 254.67 274.80 55829.613594 1.575592e+07 282.214360
2015-01-27 274.76 278.51 243.10 263.98 21159.826276 5.464232e+06 258.236160
2015-01-28 263.97 266.45 225.91 236.07 28092.334733 6.890539e+06 245.281804
2015-01-29 236.07 242.18 220.00 235.79 26012.169411 6.035005e+06 232.007002
2015-01-30 235.12 244.98 225.52 228.98 20614.836944 4.831205e+06 234.355737
... ... ... ... ... ... ... ...
2015-12-02 362.70 362.84 348.64 358.89 12343.407768 4.390063e+06 355.660512
2015-12-03 359.11 370.53 356.21 361.42 12190.883936 4.423951e+06 362.890080
2015-12-04 361.48 364.30 355.92 363.62 9989.280217 3.596927e+06 360.078716
2015-12-05 363.62 389.89 363.00 388.00 16784.126668 6.338715e+06 377.661318
2015-12-06 388.00 399.85 383.00 388.24 14679.436032 5.754540e+06 392.013731
2015-12-07 388.54 399.91 381.01 395.57 14949.586075 5.868357e+06 392.543124
2015-12-08 395.57 418.87 388.33 417.89 11455.141092 4.567280e+06 398.710062
2015-12-09 416.67 424.95 400.00 416.98 21039.908507 8.726382e+06 414.753795
2015-12-10 416.69 419.79 408.91 415.57 8463.884739 3.514699e+06 415.258402
2015-12-11 415.57 454.48 415.13 453.04 29020.198394 1.271557e+07 438.162704
2015-12-12 453.12 467.80 403.00 436.19 29770.283798 1.287552e+07 432.495694
2015-12-13 436.08 442.97 422.03 433.07 11725.667240 5.101509e+06 435.071986
2015-12-14 433.13 448.12 428.42 443.36 13386.744157 5.908605e+06 441.377271
2015-12-15 443.30 464.99 442.94 464.53 18096.388283 8.237603e+06 455.206996
2015-12-16 464.53 465.00 436.89 455.87 23377.122573 1.062749e+07 454.610502
2015-12-17 455.61 458.84 448.75 456.17 6598.640949 3.000514e+06 454.717031
2015-12-18 455.37 466.64 453.50 462.46 12914.736803 5.937708e+06 459.762242
2015-12-19 463.70 465.90 452.03 461.80 6839.178687 3.156242e+06 461.494358
2015-12-20 461.89 462.00 433.83 442.54 16626.488355 7.430073e+06 446.881661
2015-12-21 442.47 450.00 425.52 438.07 18930.863186 8.257541e+06 436.194626
2015-12-22 438.14 443.58 433.06 435.48 8146.017388 3.563201e+06 437.416358
2015-12-23 435.65 445.52 434.43 442.56 7100.228287 3.125137e+06 440.146049
2015-12-24 443.01 459.54 443.01 455.00 10138.315550 4.598656e+06 453.591713
2015-12-25 455.78 458.33 448.00 455.51 2877.167351 1.304948e+06 453.552898
2015-12-26 455.50 457.45 407.25 417.39 26672.121731 1.138669e+07 426.913653
2015-12-27 417.11 425.42 410.50 421.76 6968.573659 2.921475e+06 419.235752
2015-12-28 421.78 429.86 417.01 421.46 7560.562992 3.204102e+06 423.791428
2015-12-29 420.81 433.33 418.55 431.82 10419.585366 4.444308e+06 426.534057
2015-12-30 431.70 434.97 420.75 425.84 7717.510263 3.315825e+06 429.649630
2015-12-31 426.09 433.89 419.99 430.89 6634.863167 2.833320e+06 427.035137

365 rows × 7 columns

We can also narrow down to specific months.

In [13]:
# By month's name
btc_usd.loc["August 2017"]
Out[13]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2017-08-01 2855.81 2929.17 2615.00 2731.00 12525.076691 3.432280e+07 2740.326259
2017-08-02 2732.00 2760.00 2650.00 2703.51 9486.625526 2.570111e+07 2709.193699
2017-08-03 2703.51 2807.44 2698.83 2793.37 7963.697999 2.193830e+07 2754.788542
2017-08-04 2793.34 2877.52 2765.91 2855.00 7635.821672 2.165009e+07 2835.331752
2017-08-05 2851.01 3339.66 2848.32 3263.62 16996.273101 5.386193e+07 3169.043337
2017-08-06 3263.51 3296.51 3146.10 3222.75 5998.735789 1.941266e+07 3236.124519
2017-08-07 3216.78 3430.00 3186.00 3387.55 12046.117265 4.022585e+07 3339.320971
2017-08-08 3387.54 3490.00 3300.00 3412.41 15835.370208 5.405482e+07 3413.549765
2017-08-09 3408.46 3423.10 3178.72 3342.99 14286.844138 4.735646e+07 3314.690350
2017-08-10 3342.99 3448.00 3311.17 3413.03 9031.121280 3.065844e+07 3394.754600
2017-08-11 3410.00 3705.00 3390.67 3645.06 11927.373334 4.188802e+07 3511.923462
2017-08-12 3651.74 3934.00 3586.95 3855.10 12351.074661 4.679980e+07 3789.127641
2017-08-13 3855.04 4190.00 3841.71 4053.87 15889.829788 6.367210e+07 4007.097892
2017-08-14 4053.87 4329.43 3964.96 4306.23 14212.304207 5.983600e+07 4210.154943
2017-08-15 4320.95 4400.00 3800.00 4155.67 25515.718014 1.046514e+08 4101.447155
2017-08-16 4154.99 4379.78 3926.06 4378.84 12923.637280 5.419433e+07 4193.426713
2017-08-17 4361.99 4480.00 4167.21 4276.50 14573.185929 6.322860e+07 4338.694675
2017-08-18 4260.47 4368.00 3964.96 4100.00 17516.993361 7.322403e+07 4180.171091
2017-08-19 4100.00 4188.00 3900.00 4099.55 15036.184051 6.060491e+07 4030.604133
2017-08-20 4091.99 4125.95 4000.00 4058.68 6237.972896 2.528964e+07 4054.143713
2017-08-21 4058.64 4080.00 3949.78 3987.52 9782.056594 3.920391e+07 4007.736878
2017-08-22 3987.51 4139.31 3600.00 4085.00 23522.758166 9.160033e+07 3894.115013
2017-08-23 4078.00 4248.97 4051.94 4108.12 14979.403481 6.211729e+07 4146.846811
2017-08-24 4121.78 4350.00 4082.57 4300.34 10782.694367 4.517407e+07 4189.497275
2017-08-25 4308.80 4449.98 4270.00 4355.98 9699.610034 4.227406e+07 4358.326082
2017-08-26 4348.17 4369.78 4232.43 4333.38 6559.668604 2.822792e+07 4303.253256
2017-08-27 4333.38 4393.30 4290.32 4337.68 3979.124453 1.726397e+07 4338.634650
2017-08-28 4329.91 4399.72 4169.01 4379.99 8641.002446 3.707720e+07 4290.845145
2017-08-29 4385.00 4649.78 4336.26 4578.82 11879.642831 5.349117e+07 4502.758977
2017-08-30 4578.82 4642.22 4479.00 4573.20 8720.035040 3.980054e+07 4564.263673
2017-08-31 4573.15 4765.21 4566.66 4734.26 8911.412459 4.172621e+07 4682.334106
In [14]:
# By month's number
btc_usd.loc["08/2017"]
Out[14]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2017-08-01 2855.81 2929.17 2615.00 2731.00 12525.076691 3.432280e+07 2740.326259
2017-08-02 2732.00 2760.00 2650.00 2703.51 9486.625526 2.570111e+07 2709.193699
2017-08-03 2703.51 2807.44 2698.83 2793.37 7963.697999 2.193830e+07 2754.788542
2017-08-04 2793.34 2877.52 2765.91 2855.00 7635.821672 2.165009e+07 2835.331752
2017-08-05 2851.01 3339.66 2848.32 3263.62 16996.273101 5.386193e+07 3169.043337
2017-08-06 3263.51 3296.51 3146.10 3222.75 5998.735789 1.941266e+07 3236.124519
2017-08-07 3216.78 3430.00 3186.00 3387.55 12046.117265 4.022585e+07 3339.320971
2017-08-08 3387.54 3490.00 3300.00 3412.41 15835.370208 5.405482e+07 3413.549765
2017-08-09 3408.46 3423.10 3178.72 3342.99 14286.844138 4.735646e+07 3314.690350
2017-08-10 3342.99 3448.00 3311.17 3413.03 9031.121280 3.065844e+07 3394.754600
2017-08-11 3410.00 3705.00 3390.67 3645.06 11927.373334 4.188802e+07 3511.923462
2017-08-12 3651.74 3934.00 3586.95 3855.10 12351.074661 4.679980e+07 3789.127641
2017-08-13 3855.04 4190.00 3841.71 4053.87 15889.829788 6.367210e+07 4007.097892
2017-08-14 4053.87 4329.43 3964.96 4306.23 14212.304207 5.983600e+07 4210.154943
2017-08-15 4320.95 4400.00 3800.00 4155.67 25515.718014 1.046514e+08 4101.447155
2017-08-16 4154.99 4379.78 3926.06 4378.84 12923.637280 5.419433e+07 4193.426713
2017-08-17 4361.99 4480.00 4167.21 4276.50 14573.185929 6.322860e+07 4338.694675
2017-08-18 4260.47 4368.00 3964.96 4100.00 17516.993361 7.322403e+07 4180.171091
2017-08-19 4100.00 4188.00 3900.00 4099.55 15036.184051 6.060491e+07 4030.604133
2017-08-20 4091.99 4125.95 4000.00 4058.68 6237.972896 2.528964e+07 4054.143713
2017-08-21 4058.64 4080.00 3949.78 3987.52 9782.056594 3.920391e+07 4007.736878
2017-08-22 3987.51 4139.31 3600.00 4085.00 23522.758166 9.160033e+07 3894.115013
2017-08-23 4078.00 4248.97 4051.94 4108.12 14979.403481 6.211729e+07 4146.846811
2017-08-24 4121.78 4350.00 4082.57 4300.34 10782.694367 4.517407e+07 4189.497275
2017-08-25 4308.80 4449.98 4270.00 4355.98 9699.610034 4.227406e+07 4358.326082
2017-08-26 4348.17 4369.78 4232.43 4333.38 6559.668604 2.822792e+07 4303.253256
2017-08-27 4333.38 4393.30 4290.32 4337.68 3979.124453 1.726397e+07 4338.634650
2017-08-28 4329.91 4399.72 4169.01 4379.99 8641.002446 3.707720e+07 4290.845145
2017-08-29 4385.00 4649.78 4336.26 4578.82 11879.642831 5.349117e+07 4502.758977
2017-08-30 4578.82 4642.22 4479.00 4573.20 8720.035040 3.980054e+07 4564.263673
2017-08-31 4573.15 4765.21 4566.66 4734.26 8911.412459 4.172621e+07 4682.334106

Or even a day…

In [15]:
# By date name
btc_usd.loc["August 1, 2017"]
Out[15]:
Open                 2.855810e+03
High                 2.929170e+03
Low                  2.615000e+03
Close                2.731000e+03
Volume (BTC)         1.252508e+04
Volume (Currency)    3.432280e+07
Weighted Price       2.740326e+03
Name: 2017-08-01 00:00:00, dtype: float64
In [16]:
# By date number
btc_usd.loc["08-01-2017"]
Out[16]:
Open                 2.855810e+03
High                 2.929170e+03
Low                  2.615000e+03
Close                2.731000e+03
Volume (BTC)         1.252508e+04
Volume (Currency)    3.432280e+07
Weighted Price       2.740326e+03
Name: 2017-08-01 00:00:00, dtype: float64

What can we pass as the .loc argument when we have a DateTimeIndex?

Anything that can be converted to a datetime using pd.to_datetime, without having to specify the format argument.

When that condition holds, pandas will return all rows whose date in the index “belong” to that date or period.

We can also use the range shorthand notation to give a start and end date for selection.

In [17]:
btc_usd.loc["April 1, 2015":"April 10, 2015"]
Out[17]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2015-04-01 243.93 246.83 239.32 246.69 6226.016464 1.513601e+06 243.109050
2015-04-02 246.68 256.96 244.52 253.28 9806.822203 2.453664e+06 250.199655
2015-04-03 253.22 256.67 251.23 254.19 5048.577376 1.283171e+06 254.164902
2015-04-04 254.19 255.85 250.76 253.70 2769.281658 7.002845e+05 252.875870
2015-04-05 253.60 261.00 251.65 260.54 5759.360160 1.479483e+06 256.883285
2015-04-06 260.57 262.98 254.00 255.58 5960.677633 1.535495e+06 257.604180
2015-04-07 255.54 256.62 251.50 253.72 6010.267582 1.528034e+06 254.237260
2015-04-08 253.71 254.96 243.06 244.58 11663.656155 2.878712e+06 246.810407
2015-04-09 244.84 246.30 238.47 243.43 7943.710541 1.932558e+06 243.281478
2015-04-10 243.75 243.94 231.00 235.99 11549.630656 2.728444e+06 236.236497

See exercise 3 in the exercise list

Accessing Date Properties

Sometimes, we would like to directly access a part of the date/time.

If our date/time information is in the index, we can to df.index.XX where XX is replaced by year, month, or whatever we would like to access.

In [18]:
btc_usd.index.year
Out[18]:
Int64Index([2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
            ...
            2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020],
           dtype='int64', name='Date', length=2092)
In [19]:
btc_usd.index.day
Out[19]:
Int64Index([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10,
            ...
            12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
           dtype='int64', name='Date', length=2092)

We can also do the same if the date/time information is stored in a column, but we have to use a slightly different syntax.

df["column_name"].dt.XX
In [20]:
btc_date_column = btc_usd.reset_index()
btc_date_column.head()
Out[20]:
Date Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
0 2014-05-01 449.00 465.88 447.97 460.97 9556.037886 4.381969e+06 458.554960
1 2014-05-02 460.97 462.99 444.51 454.50 8269.891417 3.731061e+06 451.162018
2 2014-05-03 452.00 454.50 431.00 439.00 7431.626480 3.271086e+06 440.157544
3 2014-05-04 439.00 442.83 429.55 438.04 5154.407794 2.245293e+06 435.606483
4 2014-05-05 435.88 445.00 425.00 433.00 8188.082795 3.547855e+06 433.294968
In [21]:
btc_date_column["Date"].dt.year.head()
Out[21]:
0    2014
1    2014
2    2014
3    2014
4    2014
Name: Date, dtype: int64
In [22]:
btc_date_column["Date"].dt.month.head()
Out[22]:
0    5
1    5
2    5
3    5
4    5
Name: Date, dtype: int64

Leads and Lags: df.shift

When doing time series analysis, we often want to compare data at one date against data at another date.

pandas can help us with this if we leverage the shift method.

Without any additional arguments, shift() will move all data forward one period, filling the first row with missing data.

In [23]:
# so we can see the result of shift clearly
btc_usd.head()
Out[23]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-05-01 449.00 465.88 447.97 460.97 9556.037886 4.381969e+06 458.554960
2014-05-02 460.97 462.99 444.51 454.50 8269.891417 3.731061e+06 451.162018
2014-05-03 452.00 454.50 431.00 439.00 7431.626480 3.271086e+06 440.157544
2014-05-04 439.00 442.83 429.55 438.04 5154.407794 2.245293e+06 435.606483
2014-05-05 435.88 445.00 425.00 433.00 8188.082795 3.547855e+06 433.294968
In [24]:
btc_usd.shift().head()
Out[24]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-05-01 NaN NaN NaN NaN NaN NaN NaN
2014-05-02 449.00 465.88 447.97 460.97 9556.037886 4.381969e+06 458.554960
2014-05-03 460.97 462.99 444.51 454.50 8269.891417 3.731061e+06 451.162018
2014-05-04 452.00 454.50 431.00 439.00 7431.626480 3.271086e+06 440.157544
2014-05-05 439.00 442.83 429.55 438.04 5154.407794 2.245293e+06 435.606483

We can use this to compute the percent change from one day to the next. (Quiz: Why does that work? Remember how pandas uses the index to align data.)

In [25]:
((btc_usd - btc_usd.shift()) / btc_usd.shift()).head()
Out[25]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-05-01 NaN NaN NaN NaN NaN NaN NaN
2014-05-02 0.026659 -0.006203 -0.007724 -0.014036 -0.134590 -0.148542 -0.016122
2014-05-03 -0.019459 -0.018337 -0.030393 -0.034103 -0.101363 -0.123282 -0.024391
2014-05-04 -0.028761 -0.025677 -0.003364 -0.002187 -0.306423 -0.313594 -0.010340
2014-05-05 -0.007107 0.004900 -0.010592 -0.011506 0.588559 0.580130 -0.005306

Setting the first argument to n tells pandas to shift the data down n rows (apply an n period lag).

In [26]:
btc_usd.shift(3).head()
Out[26]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-05-01 NaN NaN NaN NaN NaN NaN NaN
2014-05-02 NaN NaN NaN NaN NaN NaN NaN
2014-05-03 NaN NaN NaN NaN NaN NaN NaN
2014-05-04 449.00 465.88 447.97 460.97 9556.037886 4.381969e+06 458.554960
2014-05-05 460.97 462.99 444.51 454.50 8269.891417 3.731061e+06 451.162018

A negative value will shift the data up or apply a lead.

In [27]:
btc_usd.shift(-2).head()
Out[27]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-05-01 452.00 454.50 431.00 439.00 7431.626480 3.271086e+06 440.157544
2014-05-02 439.00 442.83 429.55 438.04 5154.407794 2.245293e+06 435.606483
2014-05-03 435.88 445.00 425.00 433.00 8188.082795 3.547855e+06 433.294968
2014-05-04 431.64 434.00 420.27 428.01 8041.198415 3.439331e+06 427.713734
2014-05-05 429.00 452.00 425.67 440.00 13248.349023 5.842712e+06 441.014383
In [28]:
btc_usd.shift(-2).tail()
Out[28]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2020-01-17 8901.02 9188.10 8461.38 8696.60 8785.449927 7.715605e+07 8782.253644
2020-01-18 8698.97 8740.54 8507.93 8625.17 4420.198252 3.817641e+07 8636.810250
2020-01-19 8629.66 8637.71 8623.73 8634.42 0.369343 3.187356e+03 8629.799384
2020-01-20 NaN NaN NaN NaN NaN NaN NaN
2020-01-21 NaN NaN NaN NaN NaN NaN NaN

See exercise 4 in the exercise list

Rolling Computations: .rolling

pandas has facilities that enable easy computation of rolling statistics.

These are best understood by example, so we will dive right in.

In [29]:
# first take only the first 6 rows so we can easily see what is going on
btc_small = btc_usd.head(6)
btc_small
Out[29]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-05-01 449.00 465.88 447.97 460.97 9556.037886 4.381969e+06 458.554960
2014-05-02 460.97 462.99 444.51 454.50 8269.891417 3.731061e+06 451.162018
2014-05-03 452.00 454.50 431.00 439.00 7431.626480 3.271086e+06 440.157544
2014-05-04 439.00 442.83 429.55 438.04 5154.407794 2.245293e+06 435.606483
2014-05-05 435.88 445.00 425.00 433.00 8188.082795 3.547855e+06 433.294968
2014-05-06 431.64 434.00 420.27 428.01 8041.198415 3.439331e+06 427.713734

Below, we compute the 2 day moving average (for all columns).

In [30]:
btc_small.rolling("2d").mean()
Out[30]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-05-01 449.000 465.880 447.970 460.970 9556.037886 4.381969e+06 458.554960
2014-05-02 454.985 464.435 446.240 457.735 8912.964652 4.056515e+06 454.858489
2014-05-03 456.485 458.745 437.755 446.750 7850.758948 3.501074e+06 445.659781
2014-05-04 445.500 448.665 430.275 438.520 6293.017137 2.758190e+06 437.882013
2014-05-05 437.440 443.915 427.275 435.520 6671.245295 2.896574e+06 434.450725
2014-05-06 433.760 439.500 422.635 430.505 8114.640605 3.493593e+06 430.504351

To do this operation, pandas starts at each row (date) then looks backwards the specified number of periods (here 2 days) and then applies some aggregation function (mean) on all the data in that window.

If pandas cannot look back the full length of the window (e.g. when working on the first row), it fills as much of the window as possible and then does the operation. Notice that the value at 2014-05-01 is the same in both DataFrames.

Below, we see a visual depiction of the rolling maximum on a 21 day window for the whole dataset.

In [31]:
fig, ax = plt.subplots(figsize=(10, 4))
btc_usd["Open"].plot(ax=ax, linestyle="--", alpha=0.8)
btc_usd.rolling("21d").max()["Open"].plot(ax=ax, alpha=0.8, linewidth=3)
ax.legend(["Original", "21 day max"])
Out[31]:
<matplotlib.legend.Legend at 0x7f0740d2c588>

We can also ask pandas to apply custom functions, similar to what we saw when studying GroupBy.

In [32]:
def is_volatile(x):
    "Returns a 1 if the variance is greater than 1, otherwise returns 0"
    if x.var() > 1.0:
        return 1.0
    else:
        return 0.0
In [33]:
btc_small.rolling("2d").apply(is_volatile)
/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: FutureWarning: Currently, 'apply' passes the values as ndarrays to the applied function. In the future, this will change to passing it as Series objects. You need to specify 'raw=True' to keep the current behaviour, and you can pass 'raw=False' to silence this warning
  """Entry point for launching an IPython kernel.
Out[33]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-05-01 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2014-05-02 1.0 1.0 1.0 1.0 1.0 1.0 1.0
2014-05-03 1.0 1.0 1.0 1.0 1.0 1.0 1.0
2014-05-04 1.0 1.0 0.0 0.0 1.0 1.0 1.0
2014-05-05 1.0 1.0 1.0 1.0 1.0 1.0 1.0
2014-05-06 1.0 1.0 1.0 1.0 1.0 1.0 1.0

See exercise 5 in the exercise list

To make the optimal decision, we need to know the maximum difference between the close price at the end of the window and the open price at the start of the window.

See exercise 6 in the exercise list

Changing Frequencies: .resample

In addition to computing rolling statistics, we can also change the frequency of the data.

For example, instead of a monthly moving average, suppose that we wanted to compute the average within each calendar month.

We will use the resample method to do this.

Below are some examples.

In [34]:
# business quarter
btc_usd.resample("BQ").mean()
Out[34]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
Date
2014-06-30 546.296557 560.478689 534.375574 549.274426 9692.850513 5.418439e+06 547.779109
2014-09-30 536.939130 544.464239 523.571087 533.646087 8921.203260 4.473120e+06 533.775317
2014-12-31 358.113261 367.361413 347.553261 357.277935 14487.630863 5.182557e+06 357.025437
2015-03-31 242.648111 250.373889 233.306667 241.823889 15871.493002 3.827028e+06 242.062305
2015-06-30 236.023956 239.404396 232.456593 236.209451 7568.650463 1.787362e+06 236.050265
2015-09-30 255.205435 259.369348 250.738478 254.964565 15057.220448 3.752239e+06 255.216907
2015-12-31 343.968043 354.843587 335.019783 346.029674 21987.668969 7.534306e+06 345.209689
2016-03-31 409.723956 415.345165 402.351319 409.552747 7276.401365 2.957477e+06 409.345434
2016-06-30 508.830549 519.757143 491.237582 511.820440 5981.137422 3.332013e+06 509.388134
2016-09-30 614.719891 621.399783 604.309022 614.092500 3888.241125 2.364365e+06 613.211596
2016-12-30 722.921538 732.714176 714.873846 726.758462 4658.746976 3.501319e+06 724.799640
2017-03-31 1031.095275 1058.529451 998.373077 1032.510000 9574.254814 9.767086e+06 1029.350854
2017-06-30 1886.373956 1961.381868 1823.321319 1902.486484 11808.708093 2.482040e+07 1897.089126
2017-09-29 3452.060000 3578.992637 3312.851758 3471.192088 14591.656328 5.001818e+07 3446.717092
2017-12-29 9132.594945 9602.524725 8655.917473 9245.341978 15339.474034 1.498917e+08 9150.932048
2018-03-30 10654.949560 11107.304725 9981.257363 10571.343077 16971.607663 1.711173e+08 10536.375918
2018-06-29 7760.852637 7950.978132 7543.325275 7754.252637 10412.011976 8.196025e+07 7744.048061
2018-09-28 6792.599011 6950.023956 6639.782857 6797.711209 7098.911451 4.863160e+07 6790.558832
2018-12-31 5199.605319 5297.541809 5063.396915 5169.268191 8805.848653 3.969870e+07 5178.241610
2019-03-29 3737.812500 3799.089886 3676.291136 3742.059432 6446.776373 2.414045e+07 3737.221237
2019-06-28 7026.032308 7308.565604 6829.843407 7117.250110 10796.082022 8.051510e+07 7077.940876
2019-09-30 10429.423617 10693.213511 10038.982872 10384.737660 9112.727636 9.513540e+07 10360.841564
2019-12-31 7997.030978 8178.661087 7803.425109 7983.524457 6393.955934 5.178886e+07 7981.938611
2020-03-31 8050.675714 8255.725714 7908.213810 8120.666667 6348.487676 5.205611e+07 8093.397718

Note that unlike with rolling, a single number is returned for each column for each quarter.

The resample method will alter the frequency of the data and the number of rows in the result will be different from the number of rows in the input.

On the other hand, with rolling, the size and frequency of the result are the same as the input.

We can sample at other frequencies and aggregate with multiple aggregations function at once.

In [35]:
# multiple functions at 2 start-of-quarter frequency
btc_usd.resample("2BQS").agg(["min", "max"])
Out[35]:
Open High Low Close Volume (BTC) Volume (Currency) Weighted Price
min max min max min max min max min max min max min max
Date
2014-04-01 374.17 668.90 386.03 683.26 365.20 651.70 374.20 670.14 1467.591402 29732.720362 9.159133e+05 1.561239e+07 376.976877 667.690345
2014-10-01 0.00 426.64 0.00 453.92 0.00 390.48 0.00 426.63 0.000000 124188.885083 0.000000e+00 2.357627e+07 0.000000 420.127183
2015-04-01 209.76 310.55 222.88 317.99 198.12 292.19 209.72 310.55 1946.293030 42308.005630 4.732609e+05 9.091325e+06 214.884260 306.748292
2015-10-01 235.87 464.53 239.06 502.00 235.00 453.50 237.15 464.53 1253.006376 105959.259141 5.210775e+05 4.719959e+07 237.116083 461.494358
2016-04-01 414.66 767.37 416.99 778.85 1.50 740.11 416.31 766.62 719.159825 33056.289644 4.709121e+05 2.225764e+07 415.569853 754.723539
2016-10-03 607.19 1287.38 610.50 1350.00 604.99 1255.00 607.19 1285.33 888.660021 36018.861120 5.460154e+05 3.883046e+07 607.560859 1275.581651
2017-04-03 1076.59 4921.71 1145.00 4979.90 1076.19 4671.09 1134.58 4921.70 1804.450797 60278.946542 2.128068e+06 2.031684e+08 1127.151197 4808.168193
2017-10-02 4219.74 19187.78 4343.00 19666.00 4137.96 18465.00 4219.53 19187.78 4646.405621 70961.369658 2.032007e+07 7.721430e+08 4233.863791 19110.244062
2018-04-02 5845.20 9827.04 6165.49 9948.98 5774.72 9670.68 5848.33 9823.28 1098.628060 33035.904045 7.093171e+06 3.032200e+08 5936.398196 9827.536792
2018-10-01 3180.84 6604.76 3230.00 6756.00 3122.28 6553.13 3179.54 6604.75 839.297665 39775.389439 5.373482e+06 1.773528e+08 3171.722851 6593.879882
2019-04-01 4092.02 12927.44 4150.00 13880.00 4052.56 12030.43 4136.32 12920.54 1572.155427 37487.802426 1.506857e+07 4.769830e+08 4121.008519 12723.686028
2019-10-01 6619.53 9547.32 6938.77 10350.00 6425.00 9254.68 6612.30 9557.08 0.369343 38751.800255 3.187356e+03 3.644311e+08 6738.933350 9504.401543

As with groupby and rolling, you can also provide custom functions to .resample(...).agg and .resample(...).apply

See exercise 7 in the exercise list

To make the optimal decision we need to, for each month, compute the maximum value of the close price on any day minus the open price on the first day of the month.

See exercise 8 in the exercise list

Optional: API keys

Recall above that we had the line of code:

quandl.ApiConfig.api_key = "Dn6BtVoBhzuKTuyo6hbp"

This line told the quandl library that when obtaining making requests for data, it should use the API key Dn6BtVoBhzuKTuyo6hbp.

An API key is a sort of password that web services (like the Quandl API) require you to provide when you make requests.

Using this password, we were able to make a request to Quandl to obtain data directly from them.

The API key used here is one that we requested on behalf of this course.

If you plan to use Quandl more extensively, you should obtain your own personal API key from their website and re-run the quandl.ApiConfig.api_key... line of code with your new API key on the right-hand side.

Exercises

Exercise 1

By referring to table found at the link above, figure out the correct argument to pass as format in order to parse the dates in the next three cells below.

Test your work by passing your format string to pd.to_datetime.

In [36]:
christmas_str2 = "2017:12:25"
In [37]:
dbacks_win = "M:11 D:4 Y:2001 9:15 PM"
In [38]:
america_bday = "America was born on July 4, 1776"

(back to text)

Exercise 2

Use pd.to_datetime to express the birthday of one of your friends or family members as a datetime object.

Then use the strftime method to write a message of the format:

NAME's birthday is June 10, 1989 (a Saturday)

(where the name and date are replaced by the appropriate values)

(back to text)

Exercise 3

For each item in the list, extract the specified data from btc_usd:

  • July 2017 through August 2017 (inclusive)
  • April 25, 2015 to June 10, 2016
  • October 31, 2017

(back to text)

Exercise 4

Using the shift function, determine the week with the largest percent change in the volume of trades (the "Volume (BTC)" column).

Repeat the analysis at the bi-weekly and monthly frequencies.

Hint 1: We have data at a daily frequency and one week is 7 days.

Hint 2: Approximate a month by 30 days.

In [39]:
# your code here

(back to text)

Exercise 5

Imagine that you have access to the DeLorean time machine from "Back to the Future".

You are allowed to use the DeLorean only once, subject to the following conditions:

  • You may travel back to any day in the past.
  • On that day, you may purchase one bitcoin at market open.
  • You can then take the time machine 30 days into the future and sell your bitcoin at market close.
  • Then you return to the present, pocketing the profits.

How would you pick the day?

Think carefully about what you would need to compute to make the optimal choice. Try writing it out in the markdown cell below so you have a clear description of the want operator that we will apply after the exercise.

(Note: Don't look too far below, because in the next non-empty cell we have written out our answer.)

To make this decision, we want to know ...

Your answer here

(back to text)

Exercise 6

Do the following:

  1. Write a pandas function that implements your strategy.
  2. Pass it to the agg method of rolling_btc.
  3. Extract the "Open" column from the result.
  4. Find the date associated with the maximum value in that column.

How much money did you make? Compare with your neighbor.

In [40]:
def daily_value(df):
    # DELETE `pass` below and replace it with your code
    pass

rolling_btc = btc_usd.rolling("30d")

# do steps 2-4 here

(back to text)

Exercise 7

Now suppose you still have access to the DeLorean, but the conditions are slightly different.

You may now:

  • Travel back to the first day of any month in the past.
  • On that day, you may purchase one bitcoin at market open.
  • You can then travel to any day in that month and sell the bitcoin at market close.
  • Then return to the present, pocketing the profits.

To which month would you travel? On which day of that month would you return to sell the bitcoin?

Discuss with your neighbor what you would need to compute to make the optimal choice. Try writing it out in the markdown cell below so you have a clear description of the want operator that we will apply after the exercise.

(Note: Don't look too many cells below, because we have written out our answer.)

To make the optimal decision we need ...

Your answer here

(back to text)

Exercise 8

Do the following:

  1. Write a pandas function that implements your strategy.
  2. Pass it to the agg method of resampled_btc.
  3. Extract the "Open" column from the result.
  4. Find the date associated with the maximum value in that column.

How much money did you make? Compare with your neighbor.

Was this strategy more profitable than the previous one? By how much?

In [41]:
def monthly_value(df):
    # DELETE `pass` below and replace it with your code
    pass

resampled_btc = btc_usd.resample("MS")

# Do steps 2-4 here

Download

Launch Notebook