# Pipes, Redirection, Viewing Files and REGEXP

# Pipes

Pipes are a very powerful part of the shell. They allow chaining commands by connecting their output and input together. Lets demonstrate this using these commands:

Command Description
head Display the first lines of the input
tail Display the last lines of the input
cat Concatenate files and print them to STDOUT
nl Count the number of lines of input

Use the above commands on /etc/sysctl.conf or a similar file to get a feel for them then:

TIP

  • Use head to get the first few entries of ls /etc
  • Use nl to number the output of ls -l /etc/ppp
  • Output the last five lines of the previous command
  • Compare the result to what you get when restricting to five first before numbering

# I/O Redirection

Linux has 3 basic I/O streams.

  • STDIN: The standard input. Normally bound to the users keyboard
  • STDOUT: The standard output. Normally bound to the terminal window/screen used to display non-error text
  • STDERR: Standard error. Same as STDOUT but reserved for error related messages

All of these streams can be redirected using the arrow symbols (< and >)

# Redirecting STDOUT

Using the right arrow > you can redirect STDOUT for example to file.

TIP

  • Try using echo to create a file containing the text 'Hello World'. Be aware that a single arrow will overwrite the content of the file.
  • Using double arrows can be used to append to a file. Give that a whirl.

# Redirecting STDERR

STDERR can be redirected like STDOUT but the arrow has to be prefaced with the number 2 as in 2>. Try redirecting STDERR of a command that fails to a file.

# Excursus: File Descriptors

Remember when we noted that in Linux everything is a file? This also holds for STDIN/OUT/ERR. Every file has a file descriptor. A file descriptor is nothing more then a positive integer representing an open file. 1 is always STDOUT and 2 is always STDERR. It is up to you to guess the FD of STDIN.

Given this information and the information that & can be used to reference the value of a file descriptor this common Linux idiom should become clear:

ls foo > /dev/null 2>&1
1

It means redirect STDOUT to /dev/null and redirect STDERR to the same place that STDOUT is going.

# Redirecting all output

Redirecting STDOUT and STDERR to the same file can be achieved by using &>. This will create a file that has STDERR information above STDOUT information. Redirecting STDOUT and STDERR to different files is as easy as you would assume. Just write both redirect directives pointing them at separate files.

TIP

Think about why this would not work if you want to write to the same file and if you could make it work.

# Redirecting STDIN

This is not a common use-case but consider the tr command. It is used to replace a set of characters with a different set of characters. It does however only read from STDIN. < can be used to redirect STDIN. For example:

tr 'a-z 'A-Z < example.txt
1

would capitalize all letters in example.txt.

TIP

Try saving the result to a file.

# Viewing file content

The less command can be used to view file content in a paginated way. An alternative is more which has been around since the beginning of UNIX. It has fewer feature but is available on almost any system out of the box. In both tools pressing h will display a help screen showing the available commands. So less is more (more or less).

# Searching in a file

In less you can use / to enter search mode. Just type what you are looking for and press enter. Searching for regular expressions (regex) is supported.

# Statistics

The wc command can be used to print statistics on the content of a file. It shows word and line count as well as size.

# Sorting

The sort command can be used to sort the content of a file or input from STDIN. It supports splitting input lines by token (-t) and sorting on a resulting field selected via -k. -n sorts numerically instead ofs alphabetically and -r reverses the result.

TIP

Go ahead and sort the content of /etc/passwd numerically descending based on the user id.

# Splicing

cut can be used to split a line into multiple fields and return only those you are interested in.

cut -d: -f1,5-7 /etc/passwd
1

Would split each line using the : as field delimiter and return columns 1 and 5 through 7. This can be very useful to remove superfluous information from lets say logs.

# Grep'ing

grep is an extremely powerful pattern matching utility that can match simple text as well as complex regular expressions and has a multitude of output options.

TIP

Solve these tasks using grep:

  • Find all users that use bash by default
  • Find all users that use bash by default colorizing the matching string in the output
  • Count all users that use bash by default
  • Find all users that use bash by default and display the line numbers at which they reside in /etc/passwd

# Regex Basics

Regular expressions are a set of characters some of which have special meaning. They are used to define patterns and so called matchers can then be used to find these patterns in files or text.

Regular expressions come in many different flavors and powers making them hard to definitively define. On Linux systems you will mostly encounter 2 types which are considered as basic and extended regex.

# Basic Regular Expressions

Expression Meaning
. Any single character
[ ] A list or range of characters to match one character. ^ negates the result
* Previous character repeats 0 or more times
\ Is the escape character. It allows matching literals that have semantics in the context of regexps
^ Following text must be at the beginning of the line
$ Preceding text must be at the end of the line

The last two operations are often referred to as anchors.

# Extended Regular Expressions

Expression Meaning
? Match previous character/pattern zero or one times
+ Match previous character/pattern one or more times
| Alteration or (like logical or)

Hint: Colored grep output can be used to quickly check simple regex patterns.

# xargs

Sometimes the list of arguments you want to pass to a command is more than that command can handle. These are the cases where xargs will save your bacon. It is a command that helps build command lines for execution. Not only does this allow execution of commands with very long lists as input, it is also smart in the sense that it will build the command line of maximal length. It therefore increases efficiency by running the command less often than calling it once for every line in the input.

Here is an example:

cd /dir/with/many/files
rm *
    bash: /bin/rm: Argument list to long
ls|xargs rm
1
2
3
4

The last line will remove all files from the directory in as few calls to rm as possible.