Skip to content

csvkit - a brief introduction

csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.

If the data analysis task is not that extensive and the data is relatively small, csvkit can be used to do some exploration and basic analysis of the data you have.

This may be all that is needed before loading the data into the powerhouse that is SQLite.

Converting data into csv format

From excel into csv

# convert data from a sheet in data.xls file into data.csv
in2csv data.xls > data.csv

From json into csv

# convert json data from data.json into a csv file called data.csv
in2csv data.json > data.csv

Exploring the csv data

What columns are in the data - csvcut

# show what columns are in the iris.csv file
csvcut -n iris.csv

Peeking at the data - csvlook

Commands from csvkit can be piped into other csvkit commands and also bash commands.

# look at the first 10 rows of the iris.csv file in a pretty format
csvlook iris.csv | head -n 10

Peeking at a select columns of data - csvcut -c

# isolating certain columns can be achieved using `awk`
cat iris.csv | awk -F ',' '{ print $2,$5 }' | csvlook | head -n 10

# csvkit has a command called `csvcut` which can be used to isolate columns also
cat iris.csv | csvcut -c 2,5 | csvlook | head -n 10

Summary statistics of the csv data - csvstat

# show a summary of each column in the iris.csv file.
csvstat iris.csv

Summary statistics of a subset of data - csvgrep

# Show a summary of the rows in the data that have the string "Iris-versicolor" in them

# ... using awk to filter the data
cat iris.csv | awk -F "," 'NR==1; /Iris-versicolor/ { print }' | csvstat

# ... using csvgrep to filter the data
cat iris.csv | csvgrep -c 5 -m Iris-versicolor| csvstat