Exploring terminal functionality for interacting with data¶
Downloading files¶
In [ ]:
Copied!
%%bash
# download Alice in Wonderland
wget --output-document=alice.txt https://www.gutenberg.org/files/11/11-0.txt
%%bash
# download Alice in Wonderland
wget --output-document=alice.txt https://www.gutenberg.org/files/11/11-0.txt
--2021-05-14 15:55:46-- https://www.gutenberg.org/files/11/11-0.txt Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47 Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 174313 (170K) [text/plain] Saving to: ‘alice.txt’ 0K .......... .......... .......... .......... .......... 29% 2.25M 0s 50K .......... .......... .......... .......... .......... 58% 4.26M 0s 100K .......... .......... .......... .......... .......... 88% 4.49M 0s 150K .......... .......... 100% 267M=0.04s 2021-05-14 15:55:50 (3.77 MB/s) - ‘alice.txt’ saved [174313/174313]
Basic file browsing¶
In [ ]:
Copied!
%%bash
# See what files are in current directory
ls
# have a look at the first 20 lines in the book
head -n 20 alice.txt
%%bash
# See what files are in current directory
ls
# have a look at the first 20 lines in the book
head -n 20 alice.txt
01_poking_around.ipynb 02_grep_in_depth.ipynb 03_sed_in_depth.ipynb alice.txt init.ipynb iris.csv requirements.txt untitled.ipynb The Project Gutenberg eBook of Alice’s Adventures in Wonderland, by Lewis Carroll This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. Title: Alice’s Adventures in Wonderland Author: Lewis Carroll Release Date: January, 1991 [eBook #11] [Most recently updated: October 12, 2020] Language: English Character set encoding: UTF-8
Pattern matching lines within a file¶
In [ ]:
Copied!
%%bash
# show all lines in alice in wonderland that start with the word chapter
cat alice.txt | grep -i ^chapter
%%bash
# show all lines in alice in wonderland that start with the word chapter
cat alice.txt | grep -i ^chapter
CHAPTER I. CHAPTER II. CHAPTER III. CHAPTER IV. CHAPTER V. CHAPTER VI. CHAPTER VII. CHAPTER VIII. CHAPTER IX. CHAPTER X. CHAPTER XI. CHAPTER XII.
Some CSVKIT utilities¶
Columns in a csv file¶
In [ ]:
Copied!
%%bash
# what columns are in the data
csvcut -n iris.csv
%%bash
# what columns are in the data
csvcut -n iris.csv
1: sepal_length 2: sepal_width 3: petal_length 4: petal_width 5: species
Looking at the first few rows¶
In [ ]:
Copied!
%%bash
# see a tidy version of the first 20 rows
cat iris.csv | head -n 20 | csvlook
%%bash
# see a tidy version of the first 20 rows
cat iris.csv | head -n 20 | csvlook
| sepal_length | sepal_width | petal_length | petal_width | species | | ------------ | ----------- | ------------ | ----------- | ----------- | | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa | | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa | | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa | | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa | | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa | | 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa | | 4.6 | 3.4 | 1.4 | 0.3 | Iris-setosa | | 5.0 | 3.4 | 1.5 | 0.2 | Iris-setosa | | 4.4 | 2.9 | 1.4 | 0.2 | Iris-setosa | | 4.9 | 3.1 | 1.5 | 0.1 | Iris-setosa | | 5.4 | 3.7 | 1.5 | 0.2 | Iris-setosa | | 4.8 | 3.4 | 1.6 | 0.2 | Iris-setosa | | 4.8 | 3.0 | 1.4 | 0.1 | Iris-setosa | | 4.3 | 3.0 | 1.1 | 0.1 | Iris-setosa | | 5.8 | 4.0 | 1.2 | 0.2 | Iris-setosa | | 5.7 | 4.4 | 1.5 | 0.4 | Iris-setosa | | 5.4 | 3.9 | 1.3 | 0.4 | Iris-setosa | | 5.1 | 3.5 | 1.4 | 0.3 | Iris-setosa | | 5.7 | 3.8 | 1.7 | 0.3 | Iris-setosa |
Query a csv file using sql¶
In [ ]:
Copied!
%%bash
# query a csv file using sql
cat iris.csv | csvsql --query "SELECT species, AVG(sepal_length) FROM stdin GROUP BY species;"
%%bash
# query a csv file using sql
cat iris.csv | csvsql --query "SELECT species, AVG(sepal_length) FROM stdin GROUP BY species;"
species,AVG(sepal_length) Iris-setosa,5.005999999999999 Iris-versicolor,5.936 Iris-virginica,6.587999999999998
In [ ]:
Copied!
%%bash
# query a csv file using sql with a pretty output
cat iris.csv | csvsql --query "SELECT species, AVG(sepal_length) FROM stdin GROUP BY species;" | csvlook
%%bash
# query a csv file using sql with a pretty output
cat iris.csv | csvsql --query "SELECT species, AVG(sepal_length) FROM stdin GROUP BY species;" | csvlook
| species | AVG(sepal_length) | | --------------- | ----------------- | | Iris-setosa | 5.006… | | Iris-versicolor | 5.936… | | Iris-virginica | 6.588… |