Skip to content

Expanding the fundamentals for Data Analysis

While working with Bash in the terminal we have seen it is possible to print output to the terminal when using certain commands.

Technically what these commands do is push the output from themselves to "Standard output" or stdout. It is this Standard output which can be seen on the screen for example when using the cat command.

It is possible to direct the output from a command so that it goes somewhere else other than Standard out which ends up being printed in the terminal.

The 2 most common places to direct the output of a command are:

  • Into another command (Piping)
  • Into a file

Piping |

Piping using the | command allows us to take the output from one command and use it as the input into another command.


# take the output from `cat` command and use it in the `head` command
cat data.csv | head -n 50

The above is somewhat of a trivial example since we could have just used the head command to achieve this.

Piping becomes extremely useful when using some other commands.


# see how many files and folders are in the current directory
# Pipe the output of ls into the `wc -l` which counts the number of lines
ls | wc -l

There is also no limit on how many Pipes can be combined (or chained) together

# get rows 41 through 50 of data.csv

# get the last 10 rows of the first 50 rows of the data.csv file
cat data.csv | head -n 50 | tail -n 10


Redirection > and >>

Redirection using either single greater than > or double greater than >> can be used to write the output from a command into a file.

>

The single greater than > is used to write to a file when only the output being generated is that which should be contained in the file.

If there is no file by that name this will create it.

If there already is a file by that name using the single greater than > redirection will overwrite all of the existing contents and replace it with the output from the command.


# write the first 20 rows of a csv file to a new file
cat data.csv | head -n 20 > first_20.csv

>>

The double greater than >> is used to append the output from a command to a file.

If there is no file by that name this will create it.

If there already is a file by that name using the double greater than >> redirection will append to the end of the file the output from the command.


# write the first 20 rows and last 20 rows to a file

# write first 20 rows
cat data.csv | head -n 20 > new.csv

# add last 20 rows to the same file
cat data.csv | tail -n 20 >> new.csv