# Pipes, Redirection, Viewing Files and REGEXP
# Pipes
Pipes are a very powerful part of the shell. They allow chaining commands by connecting their output and input together. Lets demonstrate this using these commands:
Command | Description |
---|---|
head | Display the first lines of the input |
tail | Display the last lines of the input |
cat | Concatenate files and print them to STDOUT |
nl | Count the number of lines of input |
Use the above commands on /etc/sysctl.conf
or a similar file to get a feel for
them then:
TIP
- Use
head
to get the first few entries ofls /etc
- Use
nl
to number the output ofls -l /etc/ppp
- Output the last five lines of the previous command
- Compare the result to what you get when restricting to five first before numbering
# I/O Redirection
Linux has 3 basic I/O streams.
- STDIN: The standard input. Normally bound to the users keyboard
- STDOUT: The standard output. Normally bound to the terminal window/screen used to display non-error text
- STDERR: Standard error. Same as STDOUT but reserved for error related messages
All of these streams can be redirected using the arrow symbols (<
and >
)
# Redirecting STDOUT
Using the right arrow >
you can redirect STDOUT for example to file.
TIP
- Try using echo to create a file containing the text 'Hello World'. Be aware that a single arrow will overwrite the content of the file.
- Using double arrows can be used to append to a file. Give that a whirl.
# Redirecting STDERR
STDERR can be redirected like STDOUT but the arrow has to be prefaced with the
number 2 as in 2>
. Try redirecting STDERR of a command that fails to a file.
# Excursus: File Descriptors
Remember when we noted that in Linux everything is a file? This also holds for STDIN/OUT/ERR. Every file has a file descriptor. A file descriptor is nothing more then a positive integer representing an open file. 1 is always STDOUT and 2 is always STDERR. It is up to you to guess the FD of STDIN.
Given this information and the information that &
can be used to reference the
value of a file descriptor this common Linux idiom should become clear:
ls foo > /dev/null 2>&1
It means redirect STDOUT to /dev/null
and redirect STDERR to the same place that
STDOUT is going.
# Redirecting all output
Redirecting STDOUT and STDERR to the same file can be achieved by using &>
.
This will create a file that has STDERR information above STDOUT information.
Redirecting STDOUT and STDERR to different files is as easy as you would assume.
Just write both redirect directives pointing them at separate files.
TIP
Think about why this would not work if you want to write to the same file and if you could make it work.
# Redirecting STDIN
This is not a common use-case but consider the tr
command. It is used to
replace a set of characters with a different set of characters. It does
however only read from STDIN. <
can be used to redirect STDIN. For example:
tr 'a-z 'A-Z < example.txt
would capitalize all letters in example.txt
.
TIP
Try saving the result to a file.
# Viewing file content
The less
command can be used to view file content in a paginated way.
An alternative is more
which has been around since the beginning of UNIX.
It has fewer feature but is available on almost any system out of the box. In both
tools pressing h
will display a help screen showing the available commands.
So less is more (more or less).
# Searching in a file
In less
you can use /
to enter search mode. Just type what you are looking
for and press enter. Searching for regular expressions (regex) is supported.
# Statistics
The wc
command can be used to print statistics on the content of a file.
It shows word and line count as well as size.
# Sorting
The sort
command can be used to sort the content of a file or input from
STDIN. It supports splitting input lines by token (-t
) and sorting on a
resulting field selected via -k
. -n
sorts numerically instead ofs
alphabetically and -r
reverses the result.
TIP
Go ahead and sort the content of /etc/passwd
numerically descending based on
the user id.
# Splicing
cut
can be used to split a line into multiple fields and return only those
you are interested in.
cut -d: -f1,5-7 /etc/passwd
Would split each line using the :
as field delimiter and return columns 1 and
5 through 7. This can be very useful to remove superfluous information from
lets say logs.
# Grep'ing
grep
is an extremely powerful pattern matching utility that can match simple
text as well as complex regular expressions and has a multitude of output
options.
TIP
Solve these tasks using grep
:
- Find all users that use bash by default
- Find all users that use bash by default colorizing the matching string in the output
- Count all users that use bash by default
- Find all users that use bash by default and display the line numbers at which
they reside in
/etc/passwd
# Regex Basics
Regular expressions are a set of characters some of which have special meaning. They are used to define patterns and so called matchers can then be used to find these patterns in files or text.
Regular expressions come in many different flavors and powers making them hard to definitively define. On Linux systems you will mostly encounter 2 types which are considered as basic and extended regex.
# Basic Regular Expressions
Expression | Meaning |
---|---|
. | Any single character |
[ ] | A list or range of characters to match one character. ^ negates the result |
* | Previous character repeats 0 or more times |
\ | Is the escape character. It allows matching literals that have semantics in the context of regexps |
^ | Following text must be at the beginning of the line |
$ | Preceding text must be at the end of the line |
The last two operations are often referred to as anchors.
# Extended Regular Expressions
Expression | Meaning |
---|---|
? | Match previous character/pattern zero or one times |
+ | Match previous character/pattern one or more times |
| | Alteration or (like logical or) |
Hint: Colored grep output can be used to quickly check simple regex patterns.
# xargs
Sometimes the list of arguments you want to pass to a command is more than that
command can handle. These are the cases where xargs
will save your bacon.
It is a command that helps build command lines for execution. Not only does this
allow execution of commands with very long lists as input, it is also smart in the sense
that it will build the command line of maximal length. It therefore
increases efficiency by running the command less often than calling it once for
every line in the input.
Here is an example:
cd /dir/with/many/files
rm *
bash: /bin/rm: Argument list to long
ls|xargs rm
2
3
4
The last line will remove all files from the directory in as few calls to rm
as
possible.