Building a pipeline as a bash script

Up to now, the commands for interacting with the data have all been in the bash terminal.

This is great, but it is not enough if there are particular analysis or processes we would like to rerun or perform many times periodically.

The commands covered in this guide so far can be saved in a bash script, and then re ran exactly as they were the first time, any time they are needed.

Bash scripts are files that have the .sh extension.

They also must contain the string #!/bin/bash as the first line in the file. This tells the system which program to use when executing the script.

Comments are lines beginning with the hashtag #.

Other than that, building a working bash script with the data processes wanted should just be a case of pasting the desired commands (those determined to have the desired effects by interacting with the terminal) into the bash script.

Example bash script


#!/bin/bash
#####################################################
# Name: putting_it_all_together.sh
#
# Performs some basic data processing on iris data
#
# Usage: $ bash putting_it_all_together.sh
#
# Author: M. Smith
# Date: 2022-08-05
#####################################################

# Download some iris data and save it to local disk
curl -o 'iris.csv' 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'


# load the contents of iris.csv file into a table called "iris" in the sqlite file iris_database.db
sqlite-utils insert iris_database.db iris iris.csv --csv


# Query a SQLite database with some saved queries in SQL text files
sql2csv example_query_1.sql --db sqlite:///iris_database.db

# Query a SQLite database and save output to disk
sql2csv example_query_2.sql --db sqlite:///iris_database.db > summary_1.csv

# Query a SQLite database and show formatted output 
sql2csv example_query_3.sql --db sqlite:///iris_database.db | csvlook