Project details
Executive Summary
Create a data pipeline that runs periodically to place fivethirtyeight datasets into BigQuery.
Outline
fivethirthyeight provides the datasets they use for various articles they create on GitHub.
Take these datasets, tidy them up and place them in BigQuery.
Tools
- Airflow
- Github
- Google BigQuery
- Google Data Studio
Languages
- Python
- SQL
Order of operations
-
clone repo
-
for all csvs found in folder: load into a table in BigQuery (raw dataset)
-
delete local repo copy
-
perform data transforms using SQL and loading the data into tables in prod dataset
Visualisation
Tarantino Film Dashboard dashboard in DataStudio
World Alcohol Consumption Dashboard dashboard in DataStudio