Skip to content

Project details

Executive Summary

Create a data pipeline that runs periodically to place fivethirtyeight datasets into BigQuery.

Outline

fivethirthyeight provides the datasets they use for various articles they create on GitHub.

Take these datasets, tidy them up and place them in BigQuery.

Tools

  • Airflow
  • Github
  • Google BigQuery
  • Google Data Studio

Languages

  • Python
  • SQL

Order of operations

  1. clone repo

  2. for all csvs found in folder: load into a table in BigQuery (raw dataset)

  3. delete local repo copy

  4. perform data transforms using SQL and loading the data into tables in prod dataset

DAG screenshot

Visualisation

Tarantino Film Dashboard dashboard in DataStudio

World Alcohol Consumption Dashboard dashboard in DataStudio