This section of the workshop covers data ingestion, cleaning, manipulation, analysis, and visualization in Python.

We build on the skills learned in the Python fundamentals section and teach the pandas library.

At the end of this section, you will be able to:

  • Access data stored in a variety of formats

  • Combine multiple datasets based on observations that link them together

  • Perform custom operations on tables of data

  • Use the split-apply-combine method for analyzing sub-groups of data

  • Automate static analysis on changing data

  • Produce publication quality visualizations

In the end, our goal with this section is to provide you the necessary skills to – at a minimum – immediately replicate your current data analysis workflow in Python with no loss of total (computer + human) time.

This is a lower bound on the benefits you should expect to receive by studying this section.

The expression “practice makes perfect” is especially true here.

As you work with these tools, both the time to write and the time to run your programs will fall dramatically.


Basic Functionality#

The Index#

Storage Formats#

Cleaning Data#




Time series#

Intermediate Plotting#